Link To TurkPrime

How to run successful experiments and get the most out of Amazon's Mechanical Turk

Sunday, December 24, 2017

We are Hiring: Project manager/research assistant position (full time)

TurkPrime
Queens, NY
Project Manager/Research Assistant


Background: TurkPrime is a web-based platform for online participant recruitment for the social and behavioral sciences, market research, and medical research. TurkPrime was launched in May, 2015, and currently serves over 6,000 subscribers from universities and corporate institutions around the world. Over 3,000  studies are conducted on TurkPrime each month.


The TurkPrime Toolkit is a set of cutting edge research tools allowing for flexible online study management and data collection using multiple platforms including Mechanical Turk. TurkPrime manages a panel of over 150,000 Mechanical Turk respondents, and partners with multiple additional sample providers through API integration to achieve a global reach of over 20 million respondents. The combination of robust research tools as well as an extensively profiled participant pool allows TurkPrime clients to conduct high quality research flexibly and effectively.


TurkPrime is also actively engaged in academic research focusing on, but not limited to,  online data collection methodology1-5. The research team at TurkPrime works closely with our software development team to make sure that TurkPrime’s system and research practices are grounded in solid empirical research, and that the features we provide are beneficial to a wide range of researchers. TurkPrime’s research team has a track record of peer-reviewed publications that address issues such as the contribution of its software to research design and data quality, assessment and improvement of data quality on Mechanical Turk, and selective recruitment from Mechanical Turk and other platforms.


We are looking for a full time project manager/research assistant with extremely strong communication, organizational and writing skills to manage projects, help clients with technical questions, collect data for online projects, and help write peer-reviewed papers, white papers, and blog entries.  


Responsibilities: The key responsibilities for the position include: client management including responding to TurkPrime users’ questions, managing complex client projects (e.g. intensive longitudinal, diary, dyadic, and video interview studies), writing a weekly blog, study design and data collection, supporting the writing of peer-reviewed and white papers, maintaining project documentation, managing project data and materials, and quality control.


Skills: Extremely high writing and communication skills; Exceptional organization and attention to detail; Ability to use web communication and documentation software effectively; Team-oriented; Very strong work ethic; Multi-tasking; Self-starter and industrious; Adaptivity to rapidly changing demands in a high performance workplace; Background in scientific methodology (B.A. or more is required for the position).  Experience with conducting online research and knowledge of online research software (Qualtrics, Millisecond, MTurk etc) is a plus. Data analysis skills are a plus. Interest in publishing peer-reviewed papers is a plus.


Notes: TurkPrime is based in Kew Gardens Hills, NY. Initially, the project manager would be expected to be in the Kew Gardens Hills office 4 days per week. Over time, a more flexible schedule will be considered. TurkPrime is an equal opportunity employer and strongly encourages applications from members of groups underrepresented in science and technology industries.  


Applying: Please send a resume and a letter of interest to leib.litman@turkprime.com.   


Representative Recent Publications
1.   Litman, L., Robinson, J. Online Research on Mechanical Turk and Other Platforms. Sage Publications. Innovations in Methodology Series (367 pages). Scheduled to be published in 2018.
2.   Litman, L., Robinson, J., & Rosenzweig, C. (2015). The relationship between motivation, monetary compensation, and data quality among US and India-based workers on Mechanical Turk. Behavior Research Methods, 47(2), 519-528.
3.   Litman, L., Robinson, J., & Abberbock, T. (2016). TurkPrime. com: A versatile crowdsourcing data acquisition platform for the behavioral sciences. Behavior Research Methods, 1-10.
4.   Litman, L., Williams, M. T., Rosen, Z., Weinberger-Litman, S. L., & Robinson, J. (2017). Racial Disparities in Cleanliness Attitudes Mediate Consumer Attitudes toward Cleaning Products: A Serial Mediation Model. Journal of Racial and Ethnic Health Disparities, 1-9. (Developed methods relating to selective recruitment on Mechanical Turk).
5.   Litman, L., Robinson, J., Weinberger-Litman, S. L., & Finkelstein, R. (2017). Both Intrinsic and Extrinsic Religious Orientation are Positively Associated with Attitudes Toward Cleanliness: Exploring Multiple Routes from Godliness to Cleanliness. Journal of Religion and Health, 1-12. (Developed methods relating to selective recruitment on Mechanical Turk).

Friday, December 22, 2017

Conducting research: How online research tools can help

When researchers learn about conducting research online, it can sometimes be difficult to understand quite how all the tools that are available can actually be applied to a project. This post is about how specific research ideas can be carried out using the kinds of features available for online research. Online research tools can make it much simpler to recruit balanced samples of individuals that are hard to find and selectively sample using more traditional methods.

On TurkPrime, you can selectively recruit participants based on many variables that may be of interest to you, ranging from age, gender and race, to political orientation, religion, employment, physical health symptoms, and personality (see them all here). This feature can be immensely useful in targeting specific populations. If you don’t want to use this feature or want to screen for participant characteristics of a kind that isn’t on our full list, you can still find specific populations by running your own screening studies to find the kind of participants that you want to target in subsequent studies (to learn more about how to do this see “best practices” section here, as well as this post here).  

As an example of how this feature may be useful, let us imagine a religion researcher who is interested in how religiosity and religious orientation (Extrinsic vs Intrinsic orientations) predict many behaviors and decisions in life. For this researcher to carry out studies comparing different levels of religiosity, she needs to finds a sample that is not religiously homogeneous. This researcher may even want to recruit a sample that is balanced, with a third of the sample being highly religious, a third of the sample being somewhat religious, and a third of the sample being not religious at all. Once a sample like this is recruited, all sorts of interesting questions about how religion impacts behaviors can be tested.

A study just like this was recently published by some of the members of the TurkPrime staff in the Journal of Religion and Health (Litman, Robinson, Weinberger-Litman, Finkelstein, 2017). In this study, TurkPrime’s Panel feature was used in order to selectively recruit 3 groups of approximately 150 participants with different relationships to religion, some being very religious, other less religious, and a 3rd group of those who were not religious at all. In this study, researchers found that attitudes toward cleanliness was significantly predicted by religiousness, and religious orientation, even when most covariates of attitudes toward cleanliness were included in a regression model. This research has important implications for the relationship between religion and health.

Imagine for a moment how difficult it would have been to conduct this research without online sampling methods. Perhaps researchers would have tried to reach out to students on their college campus either through a college database of students, or by recruiting people in person. These researchers might hope that by reaching out to great numbers of students they would find enough to fill these three categories of religion. Maybe if they couldn’t find enough religious students this way, they would need to seek out religious groups on campus and reach out to them directly. Additional time might be spent sitting with each student while they completed a survey. This would require multiple research assistants and a lot of time and perseverance. Still, this would result in data from a convenience sample that is highly skewed toward the young, more liberal, students that are on most campuses. If this research was conducted on MTurk without the use of TurkPrime tools, it also would have been very hard to get a sample that includes religious individuals, as MTurk has a much higher rate of atheism (around 40%), than the U.S. population.

Features available in online research have significant impact on the kinds of studies that can be carried out, enhancing methods available to researchers and making one aspect of their job a whole lot easier. There are many more examples of researchers using TurkPrime to carry out complex projects, some of which can be seen in the references linked below.

References









Friday, December 15, 2017

New Feature: Exclude Highly Active Workers

Some workers on MTurk are extremely active, and take the majority of posted HITs. This can lead to many issues, some of which are outlined in our previous post. Although MTurk has over 100,000 workers who take surveys each year, and around 25,000 who take surveys each month, you are much more likely to recruit highly active workers who take a majority of HITs. About 1,000 workers (1% of workers) take 21% of the HITs. About 10,000 workers (10% of workers) take 74% of all HITs.

Turk Prime now has a feature to allow researchers to exclude the most active workers so that you can collect data from less experienced workers who are less likely to have previously taken part in research similar to your own. Below is a screenshot of the “Naivete (Exclude most active Workers)” feature. You can select what percentage of workers you would like to exclude from the dropdown menu seen below.

Friday, December 8, 2017

Best recruitment practices: working with issues of non-naivete on MTurk

It is important to consider how many highly experienced workers there are on Mechanical Turk. As discussed in previous posts, there is a population pool of active workers in the thousands, but this is far from exhaustible. A small group of workers take a very large number of HITs posted to MTurk, and these workers are very experienced and have seen measures commonly used in the social and behavioral sciences. Research has shown that when participants are repeatedly exposed to the same measures, this can have negative effects on data collection, changing the way workers perform, creating treatment effects, giving participants insight into the purpose of some studies, and in some cases impact effect sizes of experimental manipulations. This issue is referred to as non-naivete (Chandler, 2014; Chandler, 2016).

The current standard approaches to recruitment on MTurk actually compound this problem. When recruiting workers on Mechanical Turk, requesters have the ability to selectively recruit workers based on specific criteria such as the number of HITs previously approved, and the workers’ approval rating - the percentage of previous HITs that were approved of total HITs completed. A commonly used standard is to select workers who have approval ratings of >95% (see Peer, 2014). This is not quite enough on its own, however, because MTurk’s system assigns a 100% approval rating to all workers who have completed between 1 and 100 HITs, regardless of how many were actually approved. Once workers complete 100 HITs, their approval rating accurately reflects the number of HITs they were approved for. It is therefore recommended, and common practice to only recruit workers who have approval ratings of >95% and who have completed at least 100 HITs. Once researchers use the approval rating system as part of their qualifications for a study, by default, the TurkPrime system adds the qualification that workers must have previously completed at least 100 HITs in order to address this issue (researchers do of course have manual control of this).

By selectively recruiting workers with a high approval rating and a high number of previously completed HITs a requester can have increased confidence that workers in their sample can be trusted to follow instructions and pay attention to tasks. Indeed, many researchers choose to recruit participants who have high approval ratings and have completed a high number of previous studies. The approval rating system is unique, as it is a constant motivating factor that makes workers pay attention to each task. This system helps researchers collect high quality data. However, this leads to the exclusion of workers who have completed few HITs, even if they may be good providers of data but haven’t yet had the chance to “prove” it. The use of only workers who have high approval ratings has a negative effect, which is that it is a selection criteria that is based on recruiting only workers who are more experienced, and therefore less naive to measures used on the MTurk platform, bringing the issue of non-naivete to the fore.

Solutions
TurkPrime is introducing a new tool which allows requesters to exclude workers who are extremely active, thus making it possible to selectively recruit workers who are not overly active and are more naive to commonly used measures. We believe this will have great positive impacts on data collection if researchers choose to utilize it. Another option for researchers is to use Prime Panels, which has workers who are more naive to commonly used measures due to the size of the platform and its primary use for marketing research surveys which typically have very different data collection goals and uses different tools than those used on MTurk.

References

Peer, E., Vosgerau, J., & Acquisti, A. (2014). Reputation as a sufficient condition for data quality on Amazon Mechanical Turk. Behavior research methods, 46(4), 1023-1031.

TurkPrime Optimization

TurkPrime has been Optimized for Speed and Performance

Over the past few weeks, we have applied significant resources to improving the user experience for the research community and Turk and Prime Panel workers who use our site. Many of the operations and web pages now have increased speed and security so that creating, editing and launching studies are more than 10 times faster than they were previously. In addition, the dashboard where researchers can view their studies has been optimized and now loads and updates very quickly.

We are continuing to improve responsiveness in the system and customer support. Even more exciting, we are rolling out new features which will further create a more enhanced researcher and worker experience.

As always, our development is guided by our users in the research and worker communities; we at TurkPrime value your feedback and would love to hear from you how we can improve our services, which features you need and anything else that we can be of service to you via our Suggestion Box

Thanks for using TurkPrime!

Friday, December 1, 2017

Are MTurk workers who they say they are?

The internet has the reputation of being a place where people can hide in anonymity, and present as being very different people than who they actually are. Is this a problem on Mechanical Turk? Is the self-reported information provided by Mechanical Turk workers reliable? These are important questions which have been addressed with several different methods. Researchers have examined a) consistency of responding to the same questions over time and across studies b) the validity of responses, or the degree to which the items capture responses that represent the truth from participants. It turns out that there are certain situations in which MTurk workers are likely to lie, but they are who they say they are in almost all cases.


Consistency over time/Reliability:
One way of measuring truthfulness of responses is to examine how individual workers respond at different times to the same questions. In one study, data collected from over 80,000 MTurk workers examined the reliability in reported demographic information over time. This large study found that participants were overwhelmingly consistent when reporting demographic variables across different studies over time, with gender identification being 98.9% consistent, race 98.2% consistent, and birth year being 96.2% consistent, with this slightly lower score being largely due to technical issues rather than Turkers not being truthful (Rosenzweig, Robinson, & Litman, 2017).


Validity
Various forms of validity have been examined in data collected through MTurk, with results showing that data are by and large valid. We will focus on convergent validity, which refers to how a measure is correlated with other measures of known related constructs. Convergent validity of self-reported information at a group level can  be established by examining whether workers are providing logically consistent information. Data collected on TurkPrime show that associations between variables are consistent with what is found in the general population. For example, older Mechanical Turk workers tend to be more religious and more conservative, a pattern that is consistent with the general US population. The reported number of children correlates strongly with age and family status, as do divorce rates.  Self-reported time of day preference is correlated with the time of day that workers are actually active, which is also correlated with a cluster of clinical, personality, and behavioral variables that have been previously reported in the literature in studies of the general population (Unpublished Data). Similar consistent patterns have been observed in health information collected from Mechanical Turk workers. TurkPrime profiled over 10,000 Mechanical Turk workers on over 50 questions relating to physical health, with a factor analysis revealing that symptoms clustered around underlying conditions in the expected way. For example hypertension, high cholesterol, and diabetes formed a single factor. This factor, interpreted to be metabolic syndrome, correlated with other variables such as age and gender in the expected way. The rate of metabolic syndrome increases with age and was higher among men. BMI also correlated with self-reported exercise (See also Litman et al., 2015). Some other examples include the fact that rates of chronic illnesses are significantly higher among smokers compared to non-smokers, and strongly associated with BMI, with both higher and lower than average BMI being predictive of chronic illnesses.


Video tools that are currently in beta testing at TurkPrime are starting to be used to verify participants’ reported demographic characteristics such as gender, and race, and the presence of a second person for dyadic research, with promising initial results indicating that participants are highly truthful.


When Participants are Likely to Lie
Research has additionally examined the reliability of data collected when selection criteria were listed as a prerequisite to enter a study (e.g.“only open to males”). Data show that when participants are incentivized to not be truthful, such as when they are only able to take a lucrative study if they identify as a particular demographic group, they lie (Chandler & Paolacci, 2017; Rosenzweig et al., 2017). For example, in a study with a HIT title that said it was open “for men only”, 44% of participants who entered had previously consistently reported their gender as “female”.


Best Practices
When researchers want to selectively recruit participants on MTurk, they have several options. Some researchers recruit for a specific demographic group by including such specifications/selection criteria in a study open to all workers, and rely on workers to tell the truth when opting in to such a study. Based on the data, this is a mistake that will lead untruthful participants to opt-in. There are, however, several ways to selectively recruit participants that are who they say they are. One option is to use the qualifications system with characteristics already verified by MTurk, or by using TurkPrime’s qualification system. Another option is to run a study open to all workers, ask a series of initial demographic questions, and only have participants who match the desired demographic criteria proceed to the next round in the study, paying even those who were of the wrong demographic for their time. You can create worker groups on TurkPrime based on such pre-screenings which can help you track and subsequently recruit participants who match your criteria of interest.


References:
Rosenzweig, C., Robinson, J., & Litman, L. (2017, January). Are They Who They Say They Are?: Reliability and Validity of Web-Based Participants’ Self-Reported Demographic Information. Poster presented at the 18th Society for Personality and Social Psychology Annual Convention, San Antonio, TX.

Monday, November 20, 2017

Strengths and Limitations of Mechanical Turk


Hundreds of academic papers are published each year using data collected through Mechanical Turk. Researchers have gravitated to Mechanical Turk primarily because it provides high quality data quickly and affordably. However, Mechanical Turk has strengths and weaknesses as a platform for data collection. While Mechanical Turk has revolutionized data collection, it is by no means a perfect platform. Some of the major strengths and limitations of MTurk are summarized below.
Strengths
A source of quick and affordable data
Thousands of participants are looking for tasks on Mechanical Turk throughout the day, and can take your task with the click of a button. You can run a 10 minute survey with 100 participants for $1 each, and have all your data within the hour.
Data is reliable
Researchers have examined data quality on MTurk and have found that by and large, data are reliable, with participants performing on tasks in ways similar to more traditional samples. There is a useful reputation mechanism on MTurk, in which researchers can approve or reject the performance of workers on a given study. The reputation of each worker is based on the number of times their work was approved or rejected. Many researchers use a standard practice that relies on only using data from workers who have a 95% approval rating, thereby further ensuring high-quality data collection.
Participant pool is more representative compared to traditional subject pools
Traditional subject pools used in social science research are often samples that are convenient for researchers to obtain, such as undergraduates at a local university. Mechanical Turk has been shown to be more diverse, with participants who are closer to the U.S. population in terms of gender, age, race, education, and employment.
Limitations
There are two kinds of potential limitations on MTurk, technical limitations, and more fundamental limitations with the platform. Many of the technical limitations of MTurk have been resolved through scripts written by researchers or platforms such as TurkPrime, which help researchers do things they were not previously able to do on MTurk including
  • Exclude participants from a study based on participation in a previous study
  • Conduct longitudinal research
  • Make sure larger studies do not stall out after the first 500 to 1000 Workers
  • Communicate with many Workers at a time.
There are however several more fundamental limitations to data collection on MTurk:
Small population
There are about 100,000 Mechanical Turk workers who participate in academic studies each year. In any one month about 25,000 unique Mechanical Turk workers participate in online studies. These 25,000 workers participate in close to 600,000 monthly assignments. The more active workers complete hundreds of studies each month. The natural consequence of a small worker  population is that participants are continuously recycled across research labs. This creates a problem of ‘non-naivete’. Most participants on Mechanical Turk have been exposed to common experimental manipulations and this can affect their performance. Although the effects of this exposure have not been fully examined, recent research indicates that this may be impacting effect sizes of experimental manipulations, comprising data quality and the effectiveness of experimental manipulations.

Diversity

Although Mechanical Turk workers are significantly more diverse than the undergraduate subject pool, the Mechanical Turk population is significantly less diverse than the general US population. The population of MTurk workers is  significantly less politically diverse, more highly educated, younger, and less religious compared to the US population. This can complicate the way that data can be interpreted to be reliable on a population level.

Limited selective recruitment

Mechanical Turk has basic mechanisms to selectively recruit workers who have already been profiled. To accomplish this goal Mechanical Turk conducts  profiling HITs that are continuously available for workers.  However, Mechanical Turk is structured in such a way that it is much more difficult to recruit people based on characteristics that have not been profiled. For this reason while rudimentary selective recruitment mechanisms exist there are significant limitations on the ability to recruit specific segments of workers.


Solutions
TurkPrime offers researchers more specific selective recruitment opportunities, and has some features in development to help researchers target participants who are less active and therefore more naive to common experimental manipulations and survey measures. TurkPrime also offers access to PrimePanels, which has access to over 10 million participants, who can be selectively recruited, and are more diverse.


References:


Peer, E., Vosgerau, J., & Acquisti, A. (2014). Reputation as a sufficient condition for data quality on Amazon Mechanical Turk. Behavior research methods, 46(4), 1023-1031.

Friday, November 17, 2017

Upcoming New Content on the Blog

Greetings Reader,


We would like to inform you of upcoming new content to the blog! We have been posting sporadically, but plan to have weekly content for you in the future. Posts will cover a host of topics relating to conducting research online. We will aim to provide content that can be useful to both novice and more experienced readers. Content will explain features of MTurk and TurkPrime, as well as Prime Panels, that people may want to better understand. Posts may also often have suggestions for best practices based on our knowledge of how to get the most out of online research on MTurk and beyond. We will also discuss hot topic issues as they arise, and are additionally happy to take some requests from readers for future posts as well.

The TurkPrime Team hopes that you will find these blogs informative. We are committed to providing information to our users that can enhance their use of TurkPrime, and their knowledge of issues in online research.

Tuesday, October 31, 2017

New Feature: Select MTurk Workers by Big Five Personality Types

Run Studies Targeting Specific Big Five Personality Types!


TurkPrime introduces a new Big Five personality types qualification: Now social science researchers can run studies targeted to the Big Five: 
  • Extraversion
  • Agreeableness
  • Conscientiousness
  • Openness
  • Neuroticism

Each personality type can be specified by selecting a range of values from 0 (low levels of extraversion, agreeableness, etc) to 6 (high levels of the personality trait), as shown below. Note, the scales take reverse scoring into account. 



Each worker is classified for a particular trait based on answering the Ten-Item Personality Inventory (TIPI) (Gosling, et al, 2003). The workers' personality score (0.0 - 6.0) is the average of the responses they provided for that trait across multiple occasions. 

Coming soon: Non-Naivete scores to target your studies to workers who have not been exposed to many social science studies. 

Reminder: To reach a large audience that has had relatively little exposure to studies in the social and behavioral sciences and are fresh faces use TurkPrime's Prime Panels. Prime Panels reaches over 20 million participants in the US, and more around the world. 

Tuesday, February 28, 2017

New Feature! Restarts now available when Hyperbatching

Now available "Restart HIT"-- using HyperBatch!

The TurkPrime "Restart" feature (which Restarts HITs that have become sluggish; see our blog post on Restarts to see all the benefits of this feature)
can now be used in conjunction with HyperBatch. This was a highly requested feature from our users and we are happy to announce that it is now LIVE. The only difference between using the restart feature with HyperBatch and using the restart feature without HyperBatch is that you will only be able to restart the HIT once per day with HyperBatch. This helps ensure that our system will not be over-burdened. 

However, as always, when workers view HITs, they now find your HIT at the top of the MTurk HIT Dashboard. When restarted, HITs that have aged and have become hard to see by MTurk Workers will have the visibility of a brand new HIT - because TurkPrime has created a new HIT and associated it with the same specifications as your original HIT!

Additionally, when using Restart, Workers who have completed your original HIT are automatically excluded from taking the newly created HIT and are unable to accept the HIT because TurkPrime has created a Qualification Type that excludes them! 

There is nothing that a Requester needs to set up to enable the Restart feature. It appears in the TurkPrime Dashboard as soon as your HIT goes Live.

For more on the benefits of using Hyperbatch, check out our blog post