Last week, the research community was struck with concern that “bots” were contaminating data collection on Amazon’s Mechanical Turk (MTurk). We wrote about the issue and conducted our own preliminary investigation into the problem using the TurkPrime database. In this blog, we introduce two new tools TurkPrime is launching to help researchers combat suspicious activity on MTurk and reiterate some of the important takeaways from this conversation so far.
Data quality on online platforms
When researchers collect data online, it’s natural to be concerned about data quality. Participants aren’t in the lab, so researchers can’t see who is taking their survey, what those participants are doing while answering questions, or whether participants are who they say they are. Not knowing is unsettling.
Some workers on MTurk are extremely active, and take the majority of posted HITs. This can lead to many issues, some of which are outlined in our previous post. Although MTurk has over 100,000 workers who take surveys each year, and around 25,000 who take surveys each month, you are much more likely to recruit highly active workers who take a majority of HITs. About 1,000 workers (1% of workers) take 21% of the HITs. About 10,000 workers (10% of workers) take 74% of all HITs.
It is important to consider how many highly experienced workers there are on Mechanical Turk. As discussed in previous posts, there is a population pool of active workers in the thousands, but this is far from exhaustible. A small group of workers take a very large number of HITs posted to MTurk, and these workers are very experienced and have seen measures commonly used in the social and behavioral sciences. Research has shown that when participants are repeatedly exposed to the same measures, this can have negative effects on data collection, changing the way workers perform, creating treatment effects, giving participants insight into the purpose of some studies, and in some cases impact effect sizes of experimental manipulations. This issue is referred to as non-naivete (Chandler, 2014; Chandler, 2016).
Hundreds of academic papers are published each year using data collected through Mechanical Turk. Researchers have gravitated to Mechanical Turk primarily because it provides high quality data quickly and affordably. However, Mechanical Turk has strengths and weaknesses as a platform for data collection. While Mechanical Turk has revolutionized data collection, it is by no means a perfect platform. Some of the major strengths and limitations of MTurk are summarized below.
Topics: amazon mechanical turk, demographics, exclude workers, google form mechanical turk, HIT, mechanical turk, mturk, mturk api, panels, qualification, study, turkprime panels, unique worker, worker groups, workers
Studies with Panels for just $0.15 - 0.75 / complete
Now you can run Mechanical Turk studies using your own Requester account and specify over two dozen demographic traits!. The traits include gender, ethnicity, age, marital status and sexual orientation. But it does not stop there! The available options also include occupation, medical and health history, cell phone use and much more.
Google Forms can be used to deliver a study with TurkPrime in a similar manner to other survey platforms (like Qualtrics and SurveyMonkey).
Ever wonder if workers are being honest with you when they answer a survey? Or, if you specify that your study should be taken only by Women, whether some workers take the study even though they are not women?
We recently launched a ground-breaking feature that helps protect Mechanical Turk worker identities. It has been reported in the literature that Mechanical Turk Worker IDs can be used to identify the worker. This is because Amazon uses the same value for both the Worker ID on Mechanical Turk and elsewhere on Amazon properties like Amazon.com product reviews.
Block Duplicate IP Addresses
Our most popular feature request has been to allow the blocking of multiple responses from the same IP address.