Link To TurkPrime

How to run successful experiments and get the most out of Amazon's Mechanical Turk

Friday, December 8, 2017

Best recruitment practices: working with issues of non-naivete on MTurk

It is important to consider how many highly experienced workers there are on Mechanical Turk. As discussed in previous posts, there is a population pool of active workers in the thousands, but this is far from exhaustible. A small group of workers take a very large number of HITs posted to MTurk, and these workers are very experienced and have seen measures commonly used in the social and behavioral sciences. Research has shown that when participants are repeatedly exposed to the same measures, this can have negative effects on data collection, changing the way workers perform, creating treatment effects, giving participants insight into the purpose of some studies, and in some cases impact effect sizes of experimental manipulations. This issue is referred to as non-naivete (Chandler, 2014; Chandler, 2016).

The current standard approaches to recruitment on MTurk actually compound this problem. When recruiting workers on Mechanical Turk, requesters have the ability to selectively recruit workers based on specific criteria such as the number of HITs previously approved, and the workers’ approval rating - the percentage of previous HITs that were approved of total HITs completed. A commonly used standard is to select workers who have approval ratings of >95% (see Peer, 2014). This is not quite enough on its own, however, because MTurk’s system assigns a 100% approval rating to all workers who have completed between 1 and 100 HITs, regardless of how many were actually approved. Once workers complete 100 HITs, their approval rating accurately reflects the number of HITs they were approved for. It is therefore recommended, and common practice to only recruit workers who have approval ratings of >95% and who have completed at least 100 HITs. Once researchers use the approval rating system as part of their qualifications for a study, by default, the TurkPrime system adds the qualification that workers must have previously completed at least 100 HITs in order to address this issue (researchers do of course have manual control of this).

By selectively recruiting workers with a high approval rating and a high number of previously completed HITs a requester can have increased confidence that workers in their sample can be trusted to follow instructions and pay attention to tasks. Indeed, many researchers choose to recruit participants who have high approval ratings and have completed a high number of previous studies. The approval rating system is unique, as it is a constant motivating factor that makes workers pay attention to each task. This system helps researchers collect high quality data. However, this leads to the exclusion of workers who have completed few HITs, even if they may be good providers of data but haven’t yet had the chance to “prove” it. The use of only workers who have high approval ratings has a negative effect, which is that it is a selection criteria that is based on recruiting only workers who are more experienced, and therefore less naive to measures used on the MTurk platform, bringing the issue of non-naivete to the fore.

TurkPrime is introducing a new tool which allows requesters to exclude workers who are extremely active, thus making it possible to selectively recruit workers who are not overly active and are more naive to commonly used measures. We believe this will have great positive impacts on data collection if researchers choose to utilize it. Another option for researchers is to use Prime Panels, which has workers who are more naive to commonly used measures due to the size of the platform and its primary use for marketing research surveys which typically have very different data collection goals and uses different tools than those used on MTurk.


Peer, E., Vosgerau, J., & Acquisti, A. (2014). Reputation as a sufficient condition for data quality on Amazon Mechanical Turk. Behavior research methods, 46(4), 1023-1031.

TurkPrime Optimization

TurkPrime has been Optimized for Speed and Performance

Over the past few weeks, we have applied significant resources to improving the user experience for the research community and Turk and Prime Panel workers who use our site. Many of the operations and web pages now have increased speed and security so that creating, editing and launching studies are more than 10 times faster than they were previously. In addition, the dashboard where researchers can view their studies has been optimized and now loads and updates very quickly.

We are continuing to improve responsiveness in the system and customer support. Even more exciting, we are rolling out new features which will further create a more enhanced researcher and worker experience.

As always, our development is guided by our users in the research and worker communities; we at TurkPrime value your feedback and would love to hear from you how we can improve our services, which features you need and anything else that we can be of service to you via our Suggestion Box

Thanks for using TurkPrime!

Friday, December 1, 2017

Are MTurk workers who they say they are?

The internet has the reputation of being a place where people can hide in anonymity, and present as being very different people than who they actually are. Is this a problem on Mechanical Turk? Is the self-reported information provided by Mechanical Turk workers reliable? These are important questions which have been addressed with several different methods. Researchers have examined a) consistency of responding to the same questions over time and across studies b) the validity of responses, or the degree to which the items capture responses that represent the truth from participants. It turns out that there are certain situations in which MTurk workers are likely to lie, but they are who they say they are in almost all cases.

Consistency over time/Reliability:
One way of measuring truthfulness of responses is to examine how individual workers respond at different times to the same questions. In one study, data collected from over 80,000 MTurk workers examined the reliability in reported demographic information over time. This large study found that participants were overwhelmingly consistent when reporting demographic variables across different studies over time, with gender identification being 98.9% consistent, race 98.2% consistent, and birth year being 96.2% consistent, with this slightly lower score being largely due to technical issues rather than Turkers not being truthful (Rosenzweig, Robinson, & Litman, 2017).

Various forms of validity have been examined in data collected through MTurk, with results showing that data are by and large valid. We will focus on convergent validity, which refers to how a measure is correlated with other measures of known related constructs. Convergent validity of self-reported information at a group level can  be established by examining whether workers are providing logically consistent information. Data collected on TurkPrime show that associations between variables are consistent with what is found in the general population. For example, older Mechanical Turk workers tend to be more religious and more conservative, a pattern that is consistent with the general US population. The reported number of children correlates strongly with age and family status, as do divorce rates.  Self-reported time of day preference is correlated with the time of day that workers are actually active, which is also correlated with a cluster of clinical, personality, and behavioral variables that have been previously reported in the literature in studies of the general population (Unpublished Data). Similar consistent patterns have been observed in health information collected from Mechanical Turk workers. TurkPrime profiled over 10,000 Mechanical Turk workers on over 50 questions relating to physical health, with a factor analysis revealing that symptoms clustered around underlying conditions in the expected way. For example hypertension, high cholesterol, and diabetes formed a single factor. This factor, interpreted to be metabolic syndrome, correlated with other variables such as age and gender in the expected way. The rate of metabolic syndrome increases with age and was higher among men. BMI also correlated with self-reported exercise (See also Litman et al., 2015). Some other examples include the fact that rates of chronic illnesses are significantly higher among smokers compared to non-smokers, and strongly associated with BMI, with both higher and lower than average BMI being predictive of chronic illnesses.

Video tools that are currently in beta testing at TurkPrime are starting to be used to verify participants’ reported demographic characteristics such as gender, and race, and the presence of a second person for dyadic research, with promising initial results indicating that participants are highly truthful.

When Participants are Likely to Lie
Research has additionally examined the reliability of data collected when selection criteria were listed as a prerequisite to enter a study (e.g.“only open to males”). Data show that when participants are incentivized to not be truthful, such as when they are only able to take a lucrative study if they identify as a particular demographic group, they lie (Chandler & Paolacci, 2017; Rosenzweig et al., 2017). For example, in a study with a HIT title that said it was open “for men only”, 44% of participants who entered had previously consistently reported their gender as “female”.

Best Practices
When researchers want to selectively recruit participants on MTurk, they have several options. Some researchers recruit for a specific demographic group by including such specifications/selection criteria in a study open to all workers, and rely on workers to tell the truth when opting in to such a study. Based on the data, this is a mistake that will lead untruthful participants to opt-in. There are, however, several ways to selectively recruit participants that are who they say they are. One option is to use the qualifications system with characteristics already verified by MTurk, or by using TurkPrime’s qualification system. Another option is to run a study open to all workers, ask a series of initial demographic questions, and only have participants who match the desired demographic criteria proceed to the next round in the study, paying even those who were of the wrong demographic for their time. You can create worker groups on TurkPrime based on such pre-screenings which can help you track and subsequently recruit participants who match your criteria of interest.

Rosenzweig, C., Robinson, J., & Litman, L. (2017, January). Are They Who They Say They Are?: Reliability and Validity of Web-Based Participants’ Self-Reported Demographic Information. Poster presented at the 18th Society for Personality and Social Psychology Annual Convention, San Antonio, TX.