Special Offers
- IGI Global’s New Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 17 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no additional cost for multi-user licensing, no embargo of content, full-text PDF & HTML format, and more.
  Learn More
- Open Access Book (Free Access) - Encyclopedia of Information Science and Technology, Sixth Edition (ISBN: 9781668473665)
  The Encyclopedia of Information Science and Technology, Sixth Edition) continues the legacy set forth by the first five editions by providing comprehensive coverage and up-to-date definitions of the most important issues, concepts, and trends pertaining to technological advancements and information management within a variety of settings and industries. The entire book is being published under open access.
  Read Now
- Open Access Book (Free Access) - Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries (ISBN: 9781668456293)
  Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries provides information on the recent technology, mitigation, and environmental protection that must be applied for food sustainability in developing countries. This book is being published under Platinum Open Access through funding from Diponegoro University, Indonesia.
  Read Now
- Open Access Book (Free Access) - New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY (ISBN: 9781668438091)
  The Walmart Corporation and the Lumina Foundation have provided funding to make New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY fully open access, completely removing any paywall between scholars in education and the latest research on new models for the future of higher education.
  Read Now
- Open Access Book (Free Access) - Handbook of Research on the Global View of Open Access and Scholarly Communications (ISBN: 9781799898054)
  Through a collaboration between IGI Global and the University of North Texas, the Handbook of Research on the Global View of Open Access and Scholarly Communications has been published as fully open access, completely removing any paywall between researchers of any field, and the latest research on the equitable and inclusive nature of Open Access and all of its complications.
  Read Now
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Journals by Field
e-Collections
OnDemand
Open Access
- View All Open Access Opportunities
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through IGI Global’s Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open accesss endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global to publish your work under open access? Review IGI Global’s open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us

Techniques for Sampling Online Text-Based Data Sets

Lynne M. Webb, Yuanxin Wang

Source Title: Big Data Management, Technologies, and Applications

DOI: 10.4018/978-1-4666-4699-5.ch005

OnDemand:

(Individual Chapters)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

The chapter reviews traditional sampling techniques and suggests adaptations relevant to big data studies of text downloaded from online media such as email messages, online gaming, blogs, micro-blogs (e.g., Twitter), and social networking websites (e.g., Facebook). The authors review methods of probability, purposeful, and adaptive sampling of online data. They illustrate the use of these sampling techniques via published studies that report analysis of online text.

Chapter Preview

Top

Introduction

Studying social media often involves downloading publically-available textual data. Based on studies of email messages, Facebook, blogs, gaming websites, and Twitter, this essay describes sampling techniques for selecting online data for specific research projects. As previously noted (Webb & Wang, 2013; Wiles, Crow, & Pain, 2011), research methodologies for studying online text tend to follow or adapt existing research methodologies, including sampling techniques. The sampling techniques discussed in this chapter follow well-established sampling practices, resulting in representative and/or purposeful samples; however, the established techniques have been modified to apply to sampling online text—where unusually large populations of messages are available for sampling and the population of messages is a state of constant growth. The sampling techniques discussed in this chapter can be used for both qualitative and quantitative research.

Rapidly advancing internet technologies have altered daily life as well as the academic landscape. Researchers across disciplines are interested in examining the large volumes of data generated on internet platforms, such as social networking sites and mobile devices. Compared to data collected and analyzed through traditional means, big data generated around-the-clock on the internet can help researchers identify latent patterns of human behavior and perceptions that were previously unknown. The richness of the data brings economic benefits to diverse data-intensive industries such as marketing, insurance, and healthcare. Repeated observations of internet data across time amplify the size of already large data sets; data-gathered across time have long interested academics. Vast-sized data sets, typically called “big data,” share at least four shared traits: The data are unstructured, growing at an exponential rate, transformational, and highly complicated.

As more big data sets become available to the researchers through the convenience of internet technologies, ability to analyze the big data sets can weaken. Many factors can contribute to a deficiency in analysis. One major obstacle can be the capability of the analytical systems. Although software developers have introduced multiple analytical tools for scholars to employ with big data (e.g., Hadoop, Storm), the transformational nature of big data requires frequent software updates as well as increases in relevant knowledge. In other words, analyzing big data requires specialized knowledge. Another challenge is selecting an appropriate data-mining process. As Badke (2012, p.47) argued, seeking “specific results for specific queries” without employing the proper mining process can further complicate the project instead of helping manage it. Additionally, data of multi-petabyte which include millions of files from heterogeneous operating systems might be too large to back up through conventional computing methods. In such a case, the choice of the data mining tool becomes critical in determining the feasibility, efficiency, and accuracy of the research project.

Many concerns raised regarding big data collection and analysis duplicate concerns surrounding conventional online data collection:

•
Credibility of Online Resources: Authors of the online text often post anonymously. Their responses, comments, or articles are susceptible to credibility critiques;
•
Privacy Issues: Internet researchers do not necessarily have permission of the users who originally generated the text. Users are particularly uncomfortable when data generated from personal information, such as Facebook posts or text messages on mobile devices, are examined without their explicit permission. No comprehensive legal system currently exists that draws a clear distinction between publically available data and personal domains;
•
Security Issues: When successful online posters, such as bloggers, enjoy the free publicity of the internet, they also can be victimized by co-option of their original work and thus violation of their intellectual property rights. It is difficult for researchers to identify the source of a popular Twitter post that is re-tweeted thousands of times, often without acknowledging the original author. Therefore, data collected from open-access online sources might infringe authors’ copyrights.

Complete Chapter List

Search this Book:

Reset

MLA

APA

Chicago

Export Reference

Techniques for Sampling Online Text-Based Data Sets

Abstract

Introduction

Complete Chapter List