Last week, the research community was struck with concern that “bots” were contaminating data collection on Amazon’s Mechanical Turk (MTurk). We wrote about the issue and conducted our own preliminary investigation into the problem using the TurkPrime database. In this blog, we introduce two new tools TurkPrime is launching to help researchers combat suspicious activity on MTurk and reiterate some of the important takeaways from this conversation so far.
TurkPrime’s Tools to Deal with Suspicious Activity
As we announced last week, we’ve created two new tools to help researchers fight fraud in their data collection:
- Block Suspicious Geolocations
- Block Duplicate Geolocations
The Block Suspicious Geolocations tool is a Free Feature that allows researchers to block submissions from a list of suspicious geolocations. In our investigation last week, we identified several geolocations that were responsible for a majority of duplicate submissions. Our Block Suspicious Geolocations tool will prevent any MTurk Worker from submitting a HIT from these locations. As mentioned in last week’s blog, once we removed these locations from our analyses, we saw the rate of duplicate submissions from the same geolocation across studies this summer fell to 1.7%—a number well within the range of what we’ve identified as normal across the life of our platform. The screenshot below shows our new Block Suspicious Geolocations tool, found in Tab 6 “Worker Requirements” when you design a study.
Our second tool, the Block Duplicate Geolocations tool, is a Pro Feature that allows researchers to block multiple submissions from any geolocation. The Block Duplicate Geolocations tool casts a much wider net than the Block Suspicious Geolocations tool and should ensure that responses collected in any one survey come from a more distributed set of locations. By restricting the number of submissions from each geolocation, researchers can be more confident that the responses they collect are coming from unique participants. When using this tool data collection may be a little slower, especially if the target sample is concentrated in a small geographic area (e.g., one particular state). The screenshot below shows our new Block Duplicate Geolocations Tool, found in Tab 8 “Pro Features” when you design a study.
Understanding what has caused the recent increase in low quality responses on MTurk and the corresponding increase in submissions from the same geolocation is a matter of ongoing research. As we learn more details we will share them with the research community and continue to develop tools that ensure the highest quality of research data.
More immediately, we have identified a list of worker IDs that have repeatedly been associated with suspicious geolocations. In addition to the tools described above, we will create an internal exclusion list based on the worker IDs of suspicious accounts over the next several days. This exclusion list will create an additional layer of protection on our system by blocking worker accounts that have a high likelihood of being involved in fraud. We will write another blog to provide more detail about this issue in the coming days. In the meantime, however, researchers already have two powerful tools for eliminating fraud in their data collection. These tools should increase researchers’ confidence that they are obtaining genuine responses from unique workers.