This document explains how is the WURFL data organized internally.
WURFL contains tens of thousands of devices, each one with hundreds of properties. Collectively, WURFL represents a large virtual matrix of devices and capabilities. Each device often comes with several subversions (often corresponding to different version of the firmware) that are hard to model.
This is the area where WURFL is smarter than other solutions:
Exploiting these assumption allows the WURFL to:
WURFL is based on the concept of family of devices. All devices are
descendent of a generic device, but they may also descend of more
specialized families. A device which identifies itself as (user-agent)
Mozilla/5.0 (Android 11; Mobile; rv:82.0) Gecko/82.0 Firefox/82.0 is an implementation of
the browser by Mozilla and, of course, also a descendent of the
Generic Android 11. As a consequence, as soon as such a device is released
(or, we should say, as soon as ScientiaMobile detects its user agent
hitting a site), we can safely add it to the WURFL and state that it is
a descendent of the "Firefox" family.
This will let that phone inherit all of the capabilities of the family
of the Firefox browser even before that device is actually tested
This mechanism, called 'fall_back', lets programmers derive the capabilities of a given phone by looking at the capabilities of its family, unless a certain feature is specifically different for that phone. To further clarify, here is a concrete example. Samsung shipped several subversion of the Galaxy S20 5G (SM-G988U, SM-G988U1, SM-G988W etc.). The WURFL models this knowledge elegantly thanks to the fall_back mechanism. First, the Generic Android family specifies a capability called "model_name":
<device fall_back="root" id="generic" user_agent=""> <group id="ui"> : <capability name="table_support" value="true" /> </group>
you can read this as "Generic devices do not have a model name" As a WURFL default, Android phones have a model name of their Android version. This is modeled here:
<device user_agent="DO_NOT_MATCH_GENERIC_ANDROID_10_0 " fall_back= "generic_android_ver9_0" id="generic_android_ver10_0 "> <group id="product_info"> <capability name="model_name" value="Android 10.0" /> </group> </device>
When it comes to model_name for the Galaxy S20 5G properties for the device need to be overwritten from the generic android ID:
<device user_agent="Mozilla/5.0 (Linux; Android 10; SM-G981U) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.119 Mobile Safari/537.36" fall_back="generic_android_ver10_0 " id="samsung_sm_g981u_ver1"> : <group id="product_info"> : <capability name="model_name" value="SM-G981U" /> <capability name="marketing_name" value="Galaxy S20 5G" /> </group> </device> <device user_agent="Mozilla/5.0 (Linux; Android 10; SM-G981U1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.132 Mobile Safari/537.36" fall_back="samsung_sm_g981u_ver1" id="samsung_sm_g981u_ver1_subuau1"> : <group id="product_info"> : <capability name="model_name" value="SM-G981U1" /> <device user_agent="Mozilla/5.0 (Linux; Android 10; SM-G981W) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.132 Mobile Safari/537.36" fall_back="samsung_sm_g981u_ver1" id="samsung_sm_g981u_ver1_subuaw"> : <group id="product_info"> : <capability name="model_name" value="SM-G981W" />
When it comes to model name, this can be read as "there is a family of phones subset of the Galaxy S20 5G for which model_names are different among devices". All known Galaxy S20 5G devices “fall back” on a single device and inherit properties with the model_name being different among device profiles.
If you are looking into the
wurfl.xml file, the number one concept
you should be familiar with is the fall-back hierarchy. The
hierarchy allows new devices to inherit their capability values from
similar devices from the same manufacturers. The fall-back mechanism is
powerful and has great advantages, since a correct choice of fall-back
yields a high-chance that the value of capabilities is inferred
correctly. ScientiaMobile makes sure that values specific to each devices are overridden
in the profile for the speicifc device. This mechanism allows WURFL
to identify very sensible defaults even in the case of unlisted devices
(i.e. devices that don't have a specific profile in WURFL yet)
for which browser and OS version can still be determined.
In order to better explain the concepts and the functions introduced
with the new system a basic introduction to the structure of the WURFL
XML is provided here. The WURFL XML file is basically a flat list of
<device> elements, albeit the fall_back mechanism allows WURFL users to
regard it as a logical tree, in which elements have different types, as
illustrated below (also see picture):
root (also known as "the generic element") represents the capability of unrecognized HTTP clients. Generic has some special properties: it contains all WURFL capabilities, albeit always set with very conservative values (it is not wise to make assumptions about unrecognized HTTP clients). This element can be overridden to set values for unrecognized HTTP requests (for example, some may want WURFL to assume that an unrecognized request comes from a web browser, and not a mobile device).
A family is a
<device> element that does not represent any
specific device, yet its existence is useful to collect the
value of capabilities that are common to the devices (or
sub-families) falling-back into the family. Nokia Series 40 is a
great example of that.
A device marked as 'actual device root' represents an actual device which happens to have been elected as the representative of the (possibly few, possibly many) devices by the same name but potentially slightly different set of features. An example of this might be the Galaxy S20 5G, a popular device that comes in many very similar variations (the version made for Verizon may come with different pre-loaded apps, but is essentially the same phone).
Finally, a device may represent a device subversion, i.e. a device which is in principle very similar to the some existing "actual device" (see above) and which has been inserted for either capturing the delta of difference with the actual device, or simply to help the UA-String matching heuristics get to the right device when a HTTP request comes in.
Diagram: The fall-back hierarchy