Make your data useful

The essential data enrichment platform for data scientists

CML

CML is CrowdFlower Markup Language. CML is made up of a set of helper tags, which makes defining forms to collect information from our labor pools quick and painless.

The interactive form builder automatically generates most of these helper tags. If you need more control over your forms, or you simply prefer interacting with CrowdFlower through the API, CML is for you.

Why CML?

CML has 4 main advantages over raw HTML:

  1. CML automatically namespaces form elements. Because we display multiple forms in a single page, all form elements must be properly namespaced. CML takes care of that for you.
  2. CML lets you write less markup. There is no need to wrap your form elements in containers or add labels, CML writes all the extra markup for you.
  3. CML stores meta information specific to the CrowdFlower platform. Gold specification and directives for how you want your data aggregated are specified directly on the form elements.
  4. CML makes input validation simple. For instance, add validates=”required numeric” to any CML tag, and you can be sure you’ll get only numbers back in your form data for that tag.

Basic tags


cml:text

Renders a single line text field. Accepts all common attributes.

<cml:text label="Sample text field:" />

Sample cml:text

Additional attributes

default

If supplied, the value of this attribute will be pre-filled in the text box when the page is loaded. It will not be submitted and will fail the required validator until the worker enters text into it.

<cml:text label="Sample text field:" default="Enter text here" />

Sample cml:text


cml:textarea

Renders a multi-line text area. Accepts all common attributes.

<cml:textarea label="Sample text area:" />

Sample cml:textarea

Additional attributes

default
If supplied, the value of this attribute will be pre-filled in the text area when the page is loaded. It will not be submitted and will fail the required validator until the worker enters text into it.

cml:checkbox

Renders a single checkbox. Accepts all common attributes.

<cml:checkbox label="A single checkbox with a default value" />

Sample cml:checkbox

Additional attributes

default
If supplied, the value of this attribute will be submitted if the worker does not check the checkbox. Otherwise, when the worker checks the checkbox, the checkbox’s value attribute will be submitted (or, in the absence of the value attribute, the label attribute.)

cml:checkboxes

Renders a group of checkboxes. Accepts all common attributes.

<cml:checkboxes validates="required" label="Sample checkboxes:">
  <cml:checkbox label="Checkbox 1" />
  <cml:checkbox label="Checkbox 2" />
  <cml:checkbox label="Checkbox 3" />
</cml:checkboxes>

Sample cml:checkboxes

The child <cml:checkbox /> elements accept the following attributes:

default

If this attribute is “true”, the checkbox element will be pre-checked on page-load.

<cml:checkboxes validates="required" label="Sample checkboxes:">
  <cml:checkbox label="Checkbox 1" default="true" />
  <cml:checkbox label="Checkbox 2" />
  <cml:checkbox label="Checkbox 3" />
</cml:checkboxes>

Sample cml:checkboxes with default attribute

label

The visible label for this checkbox. This is different than the <cml:checkboxes> element’s “label” attribute, which is the overall field’s label / question.

value

The value that gets submitted. If this isn’t present, the value of the “label” attribute is submitted.


cml:radios

Renders a group of radio buttons. Accepts all common attributes.

<cml:radios label="Sample radio buttons:">
  <cml:radio label="Radio 1" />
  <cml:radio label="Radio 2" />
  <cml:radio label="Radio 3" />
</cml:radios>

Sample cml:radios

The child <cml:radio /> elements accept all of the same attributes as the child elements of the cml:checkboxes element.


cml:select

Renders a drop-down menu. Accepts all common attributes.

<cml:select label="Sample select box:">
  <cml:option label="Option 1" />
  <cml:option label="Option 2" />
  <cml:option label="Option 3" />
</cml:select>

Sample cml:select

The child <cml:option /> elements accept the following attributes:

label
The visible label for this option.
value
The value that gets submitted if this option is selected. If this isn’t present, the value of the “label” attribute is submitted.

cml:ratings

Renders a set of radio buttons in a single line for performing ratings. Accepts all common attributes.

<cml:ratings label="Rate me" points="4" />

Sample basic cml:ratings

Advanced tags

cml:hidden

Renders a hidden form input. Accepts all basic tag common attributes.

<cml:hidden name="secret_sauce" value="true" />

This is best used for submitting additional hidden information, such as the worker’s browser information with the user_agent validator:

<cml:hidden label="workers_browser" validates="user_agent" />

cml:hours

Renders a widget for allowing input of hours of operation for every day of the week. Accepts all basic tag common attributes except for value and default.

<cml:hours label="When is this business open?" allowunlisted="true" />

Sample cml:hours

Note that cml:hours is always required. If you do not want the field to be required, use CML Logic to hide the field if the worker does not need to input anything.

​Additional attributes

allowunlisted

If set to “true”, the hours dropdowns will include a “Not listed” option that will allow workers to specify that the hours of operation were not listed for that given day. Defaults to “false”. (optional)

Sample cml:hours with allowunlisted

Returned columns

cml:hours returns the following columns for each day of the week (mon, tue, wed, thu, fri, sat, sun):

  • {field_name}_{day}_openallday – “TRUE” if the worker selected “Open all day”, “FALSE” otherwise.
  • {field_name}_{day}_closedallday – “TRUE” if the worker selected “Closed all day”, “FALSE” otherwise.
  • {field_name}_{day}_open_1 – The first open time. One of several values:

    • “” – Nothing is submitted if the worker chose “Open all day” or “Closed all day”
    • “not_listed” – If “allowunlisted” is set to “true”, workers will have the option of selecting this option if they can not, for example, find that specific opening time on a business’ website.
    • “##:##” – Otherwise, a time is submitted in 24-hour format. For example: “17:30” (5:30pm)
  • {field_name}_{day}_close_1 – The first close time. Same format as `open_1`
  • {field_name}_{day}_open_2 – The second open time, if the worker clicked “Add extra time range”. Same format as `open_1`
  • {field_name}_{day}_close_2 – The second close time, if the worker clicked “Add extra time range”. Same format as `open_1`

cml:taxonomy

Renders a widget that allows workers to search and browse through a hierarchical list of items (a taxonomy) and select an item to be submitted. Taxonomy data must be formatted according to the Taxonomy Data Format section below.

Sample cml:taxonomy

Additional Attributes

src

The taxonomy datasource; can be either a url or the name of a javascript variable defined in the custom javascript portion of the CML editor. Javascript variables containing taxonomy data should be in the global scope, i.e., they should be assigned without using the var keyword.

// In your job's cml
<cml:taxonomy src="myTaxonomy" />

// In your job's javascript
myTaxonomy = {
  // Taxonomy data (see example data below)...
};

// WRONG!!! Don't use the 'var' keyword
var myTaxonomy = {
  // Taxonomy data (see example data below)...
};

Urls must accept the parameter callback and return a JSONP response wrapped in the function specified by the callback parameter. For example, if the url that serves taxonomy data is http://www.myserver.com/taxonomy, issuing a GET to http://www.myserver.com/taxonomy?callback=myCallback should return the JSONP response

myCallback(
   // My taxonomy data...
)
root-select

If set to "true", the taxonomy tool to only allow workers to select top-level items, while still being able to search and browse the full taxonomy. Useful when only a general category is desired.

selectable

Add taxonomy items as selectable in the tool (normally only taxonomy endpoints are selectable). Accepts a comma separated-list of taxonomy item ids surrounded by square brackets, e.g. "[1,2,3,4]"

nonselectable

A comma separated list of taxonomy items to remove as selectable in the tool.

top-level

A comma-separated list of taxonomy ids to show as top-level items in the widget. Overrides the topLevel items set in the taxonomy JSON.

search-src

A url that accepts the parameters q and callback and provides taxonomy search results for the query specified by the q parameter as JSONP wrapped in the function specified by the callback parameter. When this attribute is provided, a search box will be rendered below the taxonomy tool, allowing the worker to search through the taxonomy.

For example, if the search url is http://www.myserver.com/taxonomy_search, issuing a GET to http://www.myserver.com/taxonomy_search?callback=myCallback&q=dogs should return a JSONP response like:

myCallback(
  [
    {'name': "Pet Supplies > Dog Dishes", "id": "1234"},
    {'name': "Books > Doggies", "id": "5678"}
  ]
)

Search results should be an array of javascript objects (as shown above) with the following attributes:

  • name (String, required): The name of the taxonomy item to display when search results are rendered.
  • id (String, required): The taxonomy item id of the search result.
log-search

If set to "true", a field containing a comma-separated list of search queries will be returned with the worker’s judgment. Useful to tune search results. Ignored unless search-src is set.

Taxonomy Data Format

Taxonomy data should be in the JSON format shown below. Each taxnomy item should be associated with a category id.

{
  'topLevel': [1,2],
  'taxonomyItems': {
    '1': {
      'title': 'Books',
      'summary': 'Stuff you\'d find in a library',
      'children': ['3','4']
    },

    '2': {
      'title': 'Shoes',
      'summary': 'Stuff you\'d find in a shoebox',
      'children': ['5','6']
    },

    '3': {
      'title': 'Good books',
      'summary': 'Books you want to read, not {$4}',
      'parent': '1'
    },

    '4': {
      'title': 'Bad books',
      'summary': 'Books that suck',
      'parent': '1'
    },

    '5': {
      'title': 'Women\'s shoes',
      'summary': 'Shoes for dudettes',
      'parent': '2'
    },

    '6': {
      'title': 'Men\'s shoes',
      'summary': 'Shoes for dudes',
      'parent': '2'
    }
  }
}
topLevel
An array of root-level item ids for your taxonomy. The items with these ids will be the first ones shown the the worker when they see the taxonomy tool.
taxonomyItems
A javascript object containing the actual taxonomy items, keyed by item id. Each taxonomy item can have the following attribtues:

  • title (String, required): The name of the taxonomy item that will show up in the tool.
  • summary (String, optional): A brief description of the taxonomy item. Links to other items can be included by enclosing the item id preceeded by a dollar sign in curly braces, e.g. {$4}
  • parent (String, required): The item id of the item’s parent. Required on items that are children of other items.
  • children (Array, required): An array of taxonomy ids of the item’s children. Required on items that are parents of other items.
  • related (Array, optional): An array of taxonomy ids of items related to this item. Related items will be listed beneath the item description in the taxonomy tool.

CML Taxonomy and CML Logic

The cml:taxonomy widget does not presently work as a cml only-if logic dependency. E.g., with the following cml,

<cml:taxonomy name="taxonomyResponse" src="myTaxonomy" />

<cml:text name="textResponse" only-if="taxonomyResponse" />

the cml:text field would appear regardless of whether or not the cml:taxonomy field contained a value. This will be remedied in a future update to the taxonomy widget.

Meta tags

  • cml:instructions – Specify instructions for a field
  • cml:gold – Define the gold specification for a field
  • cml:group – Group related fields together
  • cml:meta – Define gold or aggregation behavior for non-CML fields

cml:instructions

Can only be a child of a regular field tag. The contents of this tag specify the instructions for a field. If both an instructions attribute and a <cml:instructions> tag are specified, only the value of the attribute on the field tag will be used.

<cml:radios label="Sample radio buttons:">
  <cml:instructions>Check one of these.</cml:instructions>
  <cml:radio label="Radio 1" />
  <cml:radio label="Radio 2" />
  <cml:radio label="Radio 3" />
</cml:radios>

Sample cml:instructions


cml:gold

The various attributes of this tag define the behavior of gold on a field. This can only be a child of a regular field tag.

<cml:checkboxes label="Sample radio buttons:">
  <cml:gold src="gold_column_name" strict="true" />
  <cml:checkbox label="Checkbox 1" />
  <cml:checkbox label="Checkbox 2" />
  <cml:checkbox label="Checkbox 3" />
</cml:checkboxes>
<cml:text label="Sample text field:">
  <cml:gold src="source_column_name" regex="\\d+\." flags="i" no_escape="true" />
</cml:text>

Attributes: src, exact, strict, & matcher.

cml:gold and regular expressions

Regular expressions can be used to create flexible criteria for gold on text and textarea fields. Suppose you want to allow any response that contains the phrase ‘foo bar’ to be submitted in a text field.

Here’s how you would go about setting up the regular expression match for the text field:

1. Inside your CML text field tag, add a cml:gold tag and its associated attributes.
<cml:text label="What words are sometimes used as placeholder names?" name="field_name"><cml:gold regex="{seed}" flags="i"></cml:gold></cml:text>

You can set the following regular expression attributes in your cml:gold tag:

regex: The value of this attribute is the regular expression that a correct response should satisfy. The token {seed} will be replaced with the value of the 'response_gold' field in your data when the regex attempts to match the response. For example, if you’re trying to match ‘foo bar’, or another simple phrase, set the the regex to {seed} (regex="{seed}") in your cml:gold field, and set ‘response_gold’ to ‘foo bar’.

flags: In this attribute, you can set modifiers that will change the behavior of the regular expression. Currently, there are two values that are accepted.

  • flags="i" – Makes the regex insensitive to case. E.g., If you set the ‘response_gold’ field to ‘foo bar’ but also want to accept ‘Foo bar’, the capital F will not cause the match to fail.
  • flags="m" – If you are expecting to receive multiple line responses, the ‘m’ flag will ensure that the entire response is evaluated. This is essential for copy/paste jobs.

If you would like utilize both flags simultaneously, you will need to add the flags in alphabetical order within one value (flags="im"). No delimiter is needed.

no_escape: When the ‘response_gold’ for your unit is substituted for {seed} in the regular expression, any special or illegal regular expression characters are escaped. E.g., ”|” is considered a special character in regular expressions. By default, it will be escaped and lose its special meaning unless ‘no_escape’ contains the value “true”.

Example: If you wanted to match either ‘first base’,’second base’ or ‘third base’, you might try setting ‘response_gold’ to '(first|second|third) base'. However, this would not work since the special characters would be escaped and only the literal string '(first|second|third) base' would be matched, parentheses, pipe and all. Setting no_escape="true" would have the desired effect, allowing any of the three options as a correct response.

2. Add the gold data column to your input spreadsheet or in the gold digging interface.

Let’s say you name your regex text field ‘response’. In your input data, you will need to add a 'response_gold' field and populate it with the phrase or pattern to match for each unit. Your spreadsheet might look like this:

query response_gold
Most popular sports in the United Stated? baseball|football|backetball
The largest political parties in the United States? Republican|Democratic
Pet names for your children? sweetie|toots|sugarpie

Keep in mind that if you’re using special regex characters like ”|”, you’ll have to set no_escape=”true” in your CML.

Digging gold can be done as usual, except you have to be conscious of using regular expression special characters and escaping characters if needed.

3. Set the field to gold in the graphical form interface

If you’ve predefined your gold in an uploaded spreadsheet, you’ll have to use the graphical form builder to make the ‘response’ field gold and use the ‘response_gold’ column for gold data. See the faq for more details.


cml:group

Groups related fields together so that they can easily be hidden/shown with CML logic without writing only-if attributes for every field.

<cml:checkbox label="Show extra fields?" name="show_extras" />
<cml:group only-if="!show_extras:unchecked">
  <cml:text label="Sample text field:" instructions="Enter text above" />
  <cml:checkboxes validates="required" label="Sample checkboxes:">
    <cml:checkbox label="Checkbox 1" />
    <cml:checkbox label="Checkbox 2" />
    <cml:checkbox label="Checkbox 3" />
  </cml:checkboxes>
</cml:group>

In the above example, both the cml:text and cml:checkboxes fields will be hidden/shown depending on whether the show_extras field passes the checked validator.


cml:meta

Defines gold and aggregation behavior for non-CML fields. This is useful when you want to use custom non-CML fields for gold and have worker responses aggregated.

<cml:meta name="my_field" gold="true" aggregation="agg" />

Non-input tags

  • cml:thumb – Display a cached thumbnail for a given image URL

cml:thumb

Renders a cached thumbnail of a publicly accessible image url. This thumbnail is cached on CrowdFlower’s servers for speed and to prevent issues with hotlinking images on high-volume jobs.

<cml:thumb src="{image_url}" width="100" />
src
The publicly accessible url of the image to thumbnail.
width
The width of the thumbnail (optional)
height
The height of the thumbnail (optional)
method
crop or resize. By default images are cropped. (optional)

If both height and width are not specified, a full-sized cached version of the original image will be served. This is useful for images that might be hot-linked or could disappear mid-job.

Logic

CML allows simple logic statements to attach the visibility of one or more form elements to the value of other fields via the only-if attribute. only-if logic is run when the fields that a given field depends on are validated (on element blur and form submit).

Basic only-if selectors

The easiest way to use an only-if is to specify a different field’s name attribute:

<cml:text label="Enter an integer" validates="required integer" />
<cml:text label="Describe the integer" only-if="enter_an_integer" validates="required" />

In this example, the “Describe the integer” field will display only if the “Enter an integer” field passes all of its validators (required and integer).

Overriding validators

You can override a field’s validators by using the following format: only-if="field_name:validatorName". Example:

<cml:text label="Enter an integer" validates="required integer" />
<cml:text label="Why did you enter letters?" only-if="enter_an_integer:alpha" validates="required" />

In this case, the “Why did you enter letters?” field will display only when the “Enter an integer” field passes the alpha validator.

Indexed only-if selectors

One of the most common uses of only-if is to make the display of a field dependent on the checked status of a specific radio button or checkbox or the selectedIndex value of a select. You can do so using this format: only-if="field_name:[index_number]" where “index_number” is a valid index integer.

<cml:select label="Pick a number">
  <cml:option>One</cml:option>
  <cml:option>Two</cml:option>
</cml:select>
<cml:text label="Why did you pick the first one?" only-if="pick_a_number:[0]" validates="required" />

The above CML will require and display the “Why did you pick the first one?” text field only when the first option is selected in the “Pick a number” drop-down menu. The index value is 0-based. (i.e. The first element is “0”, the second is “1”, and so on.) Additionally, the “Why did you pick the first one?” field will be validated only when it is visible. Hidden fields do not run their validators on form submit.

You can also use the input’s value instead of an index number:

<cml:checkboxes label="Symptoms">
  <cml:checkbox value="breath" label="Shortness of breath" />
  <cml:checkbox value="hair" label="Hair loss" />
</cml:checkboxes>
<cml:text label="Your hair is falling out?" only-if="symptoms:[hair]" validates="required" />

“Or”

“Or” logic can be used in indexed only-if selectors. Use the pipe character ”|” to specify an “or” relationship between values:

<cml:select label="Pick a number">
  <cml:option value="uno">One</cml:option>
  <cml:option value="dos">Two</cml:option>
  <cml:option value="tres">Three</cml:option>
  <cml:option value="quattro">Four</cml:option>
</cml:select>
<cml:text label="Why did you pick an odd number?" only-if="pick_a_number:[0]||pick_a_number:[tres]" validates="required" />

As shown above, you can mix index and value indices if desired.

Complex only-if selectors

CML only-if logic supports the following operators:

Operator Meaning Associates
++ and Left-to-right
|| or Left-to-right
! not Right-to-left

For example:

<cml:checkbox label="x" value="ex" ></cml:checkbox>
<cml:checkbox label="y" value="why" ></cml:checkbox>
<cml:checkbox label="z" value="zee" ></cml:checkbox>
<cml:text label="One of them is on." only-if="x:[ex]||y:[why]||z:[zee]" ></cml:text>
<cml:text label="All on!" only-if="x:[ex]++y:[why]++z:[zee]" ></cml:text>
<cml:text label="'ex' isn't selected." only-if="!x:[ex]" ></cml:text>

Validations

CML supports a number of pre-made validation methods to ensure data integrity. Some validators normalize the input, making it possible to create gold for complex data like phone numbers, addresses, and URLs. You can add validators by specifying a validates attribute on your CML form element:

<cml:checkboxes label="Pick one" validates="required">
  <cml:checkbox label="This one is the best" />
  <cml:checkbox label="The first one is lying" />
</cml:checkboxes>
<cml:text label="My text box" validates="required usAddress"/>

Validators are run from left to right, stopping on the first failure. For example, the above cml:text field would validate required first and then, if that validator passes, move on to usAddress and run that validation.

All validators run on blur and submit events. The required validator, however, runs only on submit. Failed validators block submission of the form.

Hidden fields (both of type hidden and those contained in elements with display: none; styling are not validated.)

General Validators

required
On a free text input, at least one non-whitespace character must be present. On a multiple choice or drop-down field, it enforces at least one item to be selected.

Number Validators

integer
Requires an integer value, e.g., 10, -5
positiveInteger
Requires a positive integer value, e.g., 10, 5
numeric
Requires an integer or floating-point value, e.g., 10, 1.5, -2.4
digits
Allows only numbers and punctuation, e.g., “10:10-99”, “100(2)”
integerRange
Ensures that the worker inputs an integer within a given range. Note: The values are inclusive.
<cml:text label="A number between 1 and 100, inclusive" validates="integerRange:{min:1,max:100}"/>

Text Validators

alpha
Requires only letters, e.g., “ABCabc”
alphanum
Allows only numbers and letters, e.g., “ahfd723nd”
date
Requires a date in MM/DD/YYYY format, e.g., “01/21/2010”
minLength
Ensures that the user’s input is at least a certain number of characters long.
<cml:text label="4 or more characters" validates="minLength:4"/>
maxLength
Ensures that the user’s input is, at most, a given number of characters long.
<cml:text label="32 or fewer characters" validates="maxLength:32"/>
rangeLength
Ensures that the user’s input is within a given length range. Note: The values are inclusive.
<cml:text label="5 to 32 characters long" validates="rangeLength:{min:5,max:32}"/>
currencyDollar
Allows only a dollar amount, e.g., “$153.40”
currency
Allows only monetary amounts with a valid currency symbol or currency code. e.g. “£1.500,57”, “1,200 DKK”. Both commas and periods are accepted as valid digit group delimiters and most international currency codes and symbols are recognized. Currency amounts are normalized to “¥1,200” or “1,200.00 JPY” -style formatting. If a list of currency codes and/or symbols is provided, valid currency types are restricted to those provided.
<cml:text label="Valid currency amount" validates="currency"/>
<cml:text label="Valid USD or GBP amount" validates="currency:['$','£','USD','GBP']"/>
usPhone
Requires a valid US phone number, e.g. “555-123-4567” This validator normalizes the worker input, removing the US country code, and formatting it as ###-###-####. This validator allows the user to enter an extension number as a separate field.
regex
Allows you to validate worker input against any arbitrary regular expression allowed in JavaScript. Because the regular expression is embedded in an HTML attribute, the following HTML entities will need to be escaped:
  • < → &lt;
  • > → &gt;
  • ” → &quot;
The regex validator makes use of attributes to specify the regular expression, its flags, and its behavior:
  • data-validates-regex – The actual regular expression. Remember that the above three special characters need to be escaped as HTML entities.
  • data-validates-regex-flags – Any combination of the standard three regular expression flags. In the context of a validator, only “i” (case-insensitive matching) and “m” (multiline matching of ^ and $) are useful. “g” will have no real effect.
  • data-validates-regex-message – A custom “validation failed message”. The default message is “This value didn’t match the expected rules.”
Examples:
<!-- Case-sensitive regular expression. Passes: "-123" -->
<cml:text label="Negative number" validates="regex" data-validates-regex="^-\d+" />
<!-- Case-insensitive regular expression. Passes: "<a>", "<FONT>" -->
<cml:text label="HTML tag" validates="regex" data-validates-regex="\<[a-z]+>"
    data-validates-regex-flags="i" />
<!-- Custom failure message.-->
<cml:text label="Vowel" validates="regex" data-validates-regex="^[aeiouy]$" 
    data-validates-regex-message="Please enter a vowel." />
<!-- Pass only if the regex *doesn't* match the worker input.-->
<cml:text label="Consonant" validates="regex:'nomatch'" 
    data-validates-regex="^[aeiouy]$" />

Web Validators

email
Requires a valid-looking email address, e.g., “bob@example.com”
url
Requires a valid-looking URL, e.g., “http://crowdflower.com.” This validator performs some normalization of the entered URL to make it more useful for gold-digging but also submits the user’s original input. The normalization includes removing “www.”, changing “https://” to “http://”, removing index pages such as “index.html” and “home.html,” and adding a trailing slash when needed. The url validator accepts the following optional restrictions:
  • google – Requires a google.___ domain
  • non-google – Forbids a google.___ domain.
  • non-search – Forbids most major search domains (Google, Bing, Yahoo, Yelp, etc.)
  • domainonly – Strips the entry of everything except the domain.
Examples:

<cml:text label="Non-Google Site" validates="required url:['non-google']" />
<cml:text label="Domain Only" validates="required url:{domainonly:true}" />
The url validator submits the following additional data:

  • {fieldname}_worker_input – the raw worker input.
urlImage
Requires a URL for a valid image. This validator executes an asynchronous request to validate that the URL points to a valid image. This validator does no “cleaning” of the worker’s input.

Address Validators

stateAbbr
Requires a valid US state abbreviation, e.g., “CA”, “NY”
zipcode
Requires a valid US zip code, e.g., “94103”
frAddress
Requires a valid address and returns French address components. Unlike the US address validator, the French address validator does not enforce a level of specificity in the address. (However, this can still be enforced via Gold). This validator uses the Google Maps API to return a valid, cleaned address in the French format: “123 Rue de Bercy, Paris, 75012, France”.
Although this address validator is specifically geared towards addresses in France, it does not prevent workers from entering addresses from other countries. Use Gold to ensure that workers don’t give you non-French addresses.
In addition to the full normalized address, this validator submits the following components as separate fields (these appear as extra columns in your CSV reports):
  • {fieldname}_workerInput – raw worker input (before normalization)
  • {fieldname}_street – street address (eg. “123 Rue de Bercy”)
  • {fieldname}_buildingLabel – building/unit label (eg. “Suite”, “Unit”)
  • {fieldname}_buildingNumnber – building/unit number (eg. “A”, “1C”)
  • {fieldname}_city – town / city (eg. “Paris”)
  • {fieldname}_postcode – postal code (eg. “75012”)
  • {fieldname}_country – country (eg. “France”)
usAddress
The usAddress validator has been deprecated by usAddress2 and will be replaced in the near future.
usAddress2
Requires a valid, specific address and returns US address components. This validator requires the worker to enter an address that is specific to the building level. It uses the Google Maps API to return a valid, cleaned address in the US format: “455 Valencia St, San Francisco, CA, 94103, USA”.
Although this address validator is specifically geared towards US addresses, it does not prevent workers from entering addresses from other countries. Use Gold to ensure that workers don’t give you non-US addresses.
In addition to the full normalized address, this validator submits the following components as separate fields (these appear as extra columns in your CSV reports):
  • {fieldname}_workerInput – raw worker input (before normalization)
  • {fieldname}_street – street address (eg. “455 Valencia St”)
  • {fieldname}_buildingLabel – building/unit label (eg. “Suite”, “Unit”)
  • {fieldname}_buildingNumnber – building/unit number (eg. “A”, “1C”)
  • {fieldname}_city – town / city (eg. “San Francisco”)
  • {fieldname}_state – two-letter state abbreviation (eg. “CA”)
  • {fieldname}_zip – postal code (eg. “94103”)
  • {fieldname}_country – country (eg. “USA”)
ukAddress
Requires a valid address and returns UK address components. Unlike the US address validator, the UK address validator does not enforce a level of specificity in the address. (However, this can still be enforced via Gold). This validator uses the Google Maps API to return a valid, cleaned address in the UK format: “123 Brookdale, Enfield, Greater London, N11 1, United Kingdom”.
Although this address validator is specifically geared towards UK addresses, it does not prevent workers from entering addresses from other countries. Use Gold to ensure that workers don’t give you non-UK addresses.
In addition to the full normalized address, this validator submits the following components as separate fields (these appear as extra columns in your CSV reports):
  • {fieldname}_workerInput – raw worker input (before normalization)
  • {fieldname}_route – street address (eg. “123 Brookdale”)
  • {fieldname}_buildingLabel – building/unit label (eg. “Suite”, “Unit”)
  • {fieldname}_buildingNumnber – building/unit number (eg. “A”, “1C”)
  • {fieldname}_post_town – town / city (eg. “London”)
  • {fieldname}_postcode – postal code (eg. “N11 1”)
  • {fieldname}_country – country (eg. “United Kingdom”)

Special Validators

clean
Cleans the worker’s input. The worker can never “fail” this validator as it does not actually validate anything, it simply cleans their input. You can use any combination of the following cleaners:
  • trim – Removes leading and trailing whitespace.
  • titlecase – Capitalizes all words that are not all uppercase nor most conjunctions.
  • uppercase – Replaces all lowercase letters with uppercase letters.
  • lowercase – Replaces all uppercase letters with lowercase letters.
  • whitespace – Removes all whitespace from the input.
  • multipleWhitespace – Replaces all consecutive whitespace with a single space. (For example, multiple spaces and newlines in a row will get replaced with a single space.)
Cleaners are processed in the order that they are defined.
Examples:

<cml:text label="Titlecase" validates="required clean:['titlecase']" />
<cml:text label="Trimmed and Lowercased" validates="required clean:['trim','titlecase']" />
user_agent
This simply sets the input’s value to the worker’s user-agent string. This is useful for debugging. For example, if you’re asking users to evaluate your site, you can use this validator to help diagnose complaints about broken pages.
This should be used only in cml:hidden tags.

Aggregation

CML allows you to specify how the CrowdFlower platform will summarize the result for each CML field in your form. The consolidation of one or more worker responses into a summarized result is referred to here as ‘aggregation’. The output from this process can be found in your job’s aggregated report or in the results hash of a unit’s JSON payload. All parent CML elements accept the aggregation attribute. This attribute moderates the aggregation method used to create your results. Below is an overview of the different types of aggregation we offer and the CML tags that use this type of aggregation by default.

agg (aggregation=”agg”)
Default aggregation for: cml:checkbox, cml:radios, cml:taxonomy, cml:checkboxes and cml:select
Returns a single “top” result – AKA the contributor response with the highest confidence (agreement weighted by contributor trust) for the given field. All other responses will be ignored. A numerical confidence value between 0 & 1 is also returned.
all (aggregation=”all”)
Default aggregation for: cml:text and cml:textarea
Returns all unique responses submitted by all trusted workers for the given field. The result will be a newline ‘\n’ delimited list in the Agreggated report.
avg (aggregation=”avg”)
Default aggregation for: cml:rating
Returns a numeric average calculated based on all responses. Variance will also be returned.
agg_{x} (aggregation=”agg_x”)
Returns the top x responses (based on confidence) for the given field. This is useful when there are multiple correct responses to a question (e.g., ‘Select all that apply’ question formats). If you want the top three responses ranked by confidence, then specify agg_3 etc. It is important to note that this setting will result in up-to x responses whenever that many unique responses are available, even if the last response has very low confidence or no agreement. The result will be a newline ‘\n’ delimited list in the Aggregated report.
cagg_{x} (aggregation=”cagg_x”)
Returns all responses with a confidence greater than the specified value for x for the given field. x can be any floating point number between 0 & 1. Always prefix ‘0’ for decimals; for example, cagg_0.4 will return all responses with a confidence score greater than 0.4. The result will be a newline ‘\n’ delimited list in the Aggregated report. The aggregate result will be empty if no responses feature a confidence score higher than the specified value.

Attributes

Common attributes

label
Every CML base tag (along with the cml:checkbox and cml:radio child tags) can have a label attribute. This will be displayed next to the generated form element. If no name attribute is specified on a base tag, this will be converted into a name by removing alpha-numeric characters and replacing spaces with an underscore.
name
Every CML base tag (except cml:group) along with the cml:checkbox and cml:radio child tags can have a name attribute. This must be unique across all base tags. The name should not contain capital letters, spaces, or non alpha-numeric characters. This will become a column header in your generated CSV.
value
The primary use for this attribute is setting cml:hidden tag values. In the case of cml:option, cml:checkbox and cml:radio, label will become the value if none is specified. If you want to have a default value for cml:text or cml:textarea, we suggest using the default attribute as this provides a better experience for the user.
default
When specified on cml:select, it must be the label name of the option that will be selected by default. When used on cml:textrea or cml:text, this provides an example input for the user and disappears once the user selects the form element. Default values will not be submitted.
instructions
Every CML base tag can have an instructions attribute. This will be displayed next to the generated form element to help clarify the desired input. If both an instructions attribute and a <cml:instructions /> tag is specified, only the value of the attribute will be used.
only-if
Every CML base tag can have an only-if attribute. The value of this attribute should be the name of the field a user must complete before this field or group of fields will be displayed. See logic for more details.
validates
Every CML base tag can have a validates attribute. This attribute enforces the specified validations to occur on this form element. See validations for more details.
aggregation
Every CML base tag can have an aggregation attribute. This attribute tells us how to aggregate your data once it has been collected. The value of this attribute can currently be avg, agg, or all. See aggregation for more details.

gold attributes

gold  

Every CML base tag can have a gold attribute except cml:group. If set to "true" this will link the form element with the data specified in the form_element_name_gold column of your uploaded spreadsheet. If it is not set to “true,” it will link the form element with the column specified. See our gold documentation for more details.

exact  

The exact attribute requires that the set of worker responses and the set of gold standard responses for a CML field are identical. This only applies to cml:text/textarea fields with multiple="true" and cml:checkboxes field. A worker’s responses will not pass gold if they include any item not found in the set of gold responses (e.g., if a, b, and c are set in your gold data, then a worker will only be correct if he submits a, b, and c). Order does not matter. Best used with cml:gold.

strict  

The strict attribute requires that every response submitted is included in the set of gold standard responses. Unlike exact, however, strict allows workers to omit responses that are part of the gold set (e.g., if a, b, and c are set in your gold data, a worker will be considered correct if he submits only a and b). Best used with cml:gold.

src  

The src attribute specifies which column in the uploaded data the gold information will be retrieved from. Only necessary when creating Gold from a spreadsheet. Best used with cml:gold.

matcher  

The matcher attribute allows you to redefine the method by which a worker’s response is evaluated against Gold Standard data. In the absence of the matcher attribute, a worker’s response and the Gold Standard response must match exactly. The Matcher attribute allows for more flexibility. The following property values can be used to affect gold behavior:

Property values:

  • not – Allows gold creation on what should not be submitted. This works with any input in your form. Example: If ‘a’ is set as a gold value on field with matcher set to “not”, and a worker submits ‘a’ as a response, the worker will miss the gold unit. Responses are saved in the same way as you would normal gold responses. See our gold documentation for more details.
  • range – Specifies a numerical range within a worker’s response must fall. You must define a minimum and maximum value, which can be set in the gold digging interface or with a spreadsheet The headers should be formatted as column_name_min and column_name_max). A number validator is recommended.
  • address – (usAddress2, ukAddress, frAddress, internationalAddress) – Allows one to create gold on individual components (E.g., the state) of an address. When a worker misses the gold, they will see a message that clearly explains which components they missed. See validations for more details.

Best used with cml:gold.

Print