Amazon Mechanical Turk: A Web-Based Tool for Facilitating Experimental Research in ANLP

Amazon Mechanical Turk: A Web-Based Tool for Facilitating Experimental Research in ANLP

Amber Chauncey Strain, Lucille M. Booker
DOI: 10.4018/978-1-61350-447-5.ch007
OnDemand:
(Individual Chapters)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

One of the major challenges of ANLP research is the constant balancing act between the need for large samples, and the excessive time and monetary resources necessary for acquiring those samples. Amazon’s Mechanical Turk (MTurk) is a web-based data collection tool that has become a premier resource for researchers who are interested in optimizing their sample sizes and minimizing costs. Due to its supportive infrastructure, diverse participant pool, quality of data, and time and cost efficiency, MTurk seems particularly suitable for ANLP researchers who are interested in gathering large, high quality corpora in relatively short time frames. In this chapter, the authors first provide a broad description of the MTurk interface. Next, they describe the steps for acquiring IRB approval of MTurk experiments, designing experiments using the MTurk dashboard, and managing data. Finally, the chapter concludes by discussing the potential benefits and limitations of using MTurk for ANLP experimentation.
Chapter Preview
Top

Introduction

Linguists, cognitive scientists, psychologists, and the like use a multifaceted collection of techniques for understanding human thoughts, feelings, and behaviors. These techniques range from informal surveys to highly controlled laboratory experiments. Selecting which technique to use in any particular study often involves a tradeoff between gathering a sufficiently large sample, and the time and monetary requirements needed to acquire that sample.

Traditionally, ANLP researchers have tried to balance this trade off by amassing existing corpora (Duran, McCarthy, Graesser, & McNamara, 2007; McCarthy & Jarvis, 2007), or by plucking readily available texts off the web (Lightman, McCarthy, Difty, & McNamara, 2007), in order to have sufficiently large sample sizes. These methods are not optimal, but have been accepted because the alternative was to invite hundreds of participants into the laboratory and ask them to generate numerous texts, which is neither time nor cost efficient.

In their quest to find better methods for acquiring large datasets while still keeping costs low, many ANLP researchers have turned to web-based services. These services expedite data collection while remaining well within even modest budgets. One such web-based service is Amazon’s Mechanical Turk (MTurk). Although other similar services (e.g., Survey Monkey, Sona Systems, Google Forms), are popular and undoubtedly beneficial to research, it appears at this time that MTurk has emerged as a leader in this industry, especially in relation to ANLP, and therefore it is on MTurk that this chapter focuses

There is a wide range of applications of MTurk in ANLP research, which demonstrates the massive potential of this tool. For instance, MTurk has been used in ANLP studies on automated categorization (Mihalcea &Strapparava, 2009); contextual predictability (Schnoebelen & Kuperman, 2010), data extraction (Higgins, McGrath, Moretto, 2010), semantic transparency (Munro et al, 2010), speech recognition (Lambert, Singh & Raj, 2010), and textual entailment (Snow, O’Connor, Jurafsky, & Ng, 2008). Indeed, it seems that any ANLP experiment that can be concisely explained and successfully completed in an online environment is appropriate for MTurk.

The purpose of this chapter is to provide a brief background into the utility of Mechanical Turk as a data collection tool, to describe the procedure for using MTurk for ANLP research, and to discuss possible contexts in which MTurk is a useful ANLP research tool.

Complete Chapter List

Search this Book:
Reset