In a recent research article (https://psyarxiv.com/jq589/), we reported on the size of the MTurk worker pool. Using observed participation rates from the TurkPrime database, we found that over each of the last three years there have been approximately eighty to eighty-five thousand US workers active on MTurk each year (Figure 1). What these numbers do not illustrate is how often new workers join the platform—a number that is important for researchers who want to sample naive participants.
- We collected high quality data on MTurk when using TurkPrime’s IP address and Geocode-restricting tools.
- Using a novel format for our anchoring manipulation, we found that Turkers are highly attentive, even under taxing conditions.
- After querying the TurkPrime database, we found that farmer activity has significantly decreased over the last month.
- When used the right way, researchers can be confident they are collecting quality data on MTurk.
- We are continuously monitoring and maintaining data quality on MTurk.
- Starting this month, we will be conducting monthly surveys of data quality on Mechanical Turk.
Last week, the research community was struck with concern that “bots” were contaminating data collection on Amazon’s Mechanical Turk (MTurk). We wrote about the issue and conducted our own preliminary investigation into the problem using the TurkPrime database. In this blog, we introduce two new tools TurkPrime is launching to help researchers combat suspicious activity on MTurk and reiterate some of the important takeaways from this conversation so far.
Data quality on online platforms
When researchers collect data online, it’s natural to be concerned about data quality. Participants aren’t in the lab, so researchers can’t see who is taking their survey, what those participants are doing while answering questions, or whether participants are who they say they are. Not knowing is unsettling.
TurkPrime is announcing a change in our pricing for the MicroBatch feature. MicroBatch is now included as a Pro feature, with a fee of 2 cents + 5% per complete. This will also provide users with access to all other pro features, with no additional charge. This change is necessary so that we can continue to provide the highest quality service and tools that our users expect.
Some workers on MTurk are extremely active, and take the majority of posted HITs. This can lead to many issues, some of which are outlined in our previous post. Although MTurk has over 100,000 workers who take surveys each year, and around 25,000 who take surveys each month, you are much more likely to recruit highly active workers who take a majority of HITs. About 1,000 workers (1% of workers) take 21% of the HITs. About 10,000 workers (10% of workers) take 74% of all HITs.
It is important to consider how many highly experienced workers there are on Mechanical Turk. As discussed in previous posts, there is a population pool of active workers in the thousands, but this is far from exhaustible. A small group of workers take a very large number of HITs posted to MTurk, and these workers are very experienced and have seen measures commonly used in the social and behavioral sciences. Research has shown that when participants are repeatedly exposed to the same measures, this can have negative effects on data collection, changing the way workers perform, creating treatment effects, giving participants insight into the purpose of some studies, and in some cases impact effect sizes of experimental manipulations. This issue is referred to as non-naivete (Chandler, 2014; Chandler, 2016).
The internet has the reputation of being a place where people can hide in anonymity, and present as being very different people than who they actually are. Is this a problem on Mechanical Turk? Is the self-reported information provided by Mechanical Turk workers reliable? These are important questions which have been addressed with several different methods. Researchers have examined a) consistency of responding to the same questions over time and across studies b) the validity of responses, or the degree to which the items capture responses that represent the truth from participants. It turns out that there are certain situations in which MTurk workers are likely to lie, but they are who they say they are in almost all cases.
Hundreds of academic papers are published each year using data collected through Mechanical Turk. Researchers have gravitated to Mechanical Turk primarily because it provides high quality data quickly and affordably. However, Mechanical Turk has strengths and weaknesses as a platform for data collection. While Mechanical Turk has revolutionized data collection, it is by no means a perfect platform. Some of the major strengths and limitations of MTurk are summarized below.
Topics: amazon mechanical turk, demographics, exclude workers, google form mechanical turk, HIT, mechanical turk, mturk, mturk api, panels, qualification, study, turkprime panels, unique worker, worker groups, workers
What is the completion rate and dropout rate?
Dropout rate is defined as the percentage of participants who start taking a study but do not complete it. Dropout rate is sometimes referred to as attrition rate, and is the opposite of completion rate (dropout rate = 100 – completion rate). On MTurk, completion rate is defined as the number of Workers who submit a HIT divided by the number of Workers who accept the HIT. Note that, for the definition of completion rate used here, Rejected Workers are counted as completes.