Twitter Data Acquisition and Analysis: Methodology and Best Practice

Twitter Data Acquisition and Analysis: Methodology and Best Practice

Stephen Dann (Australian National University, Australia)
DOI: 10.4018/978-1-4666-8408-9.ch012


Social media data collection is often treated as tacit knowledge with the collation of tweets reduced to a single sentence without explanation as to means, mechanisms or relative merit of the approach. This chapter describes methods and techniques for the capture of Twitter timeline data, inclusive of first person and third party methods for data capture from personal accounts, public accounts, and keyword searches. The chapter takes a practical approach to acquiring Twitter data with a focus on individual timelines, and small to medium scale search sets. The emphasis is on being able to obtain, examine, and convert Twitter data into knowledge quickly, and with limited requirement for technical skills. This type of data collection assumes no prior programming knowledge. The chapter explains how to retrieve Twitter data from three sources: personally controlled timelines, third party timelines and ongoing search results. Finally, the chapter describes preliminary analysis that can be performed to ascertain content creation patterns, without recourse to analysis of individual tweets.
Chapter Preview

Capturing Twitter Data: Timeline

This chapter features an extended examination of four tweet capture mechanisms to articulate the approaches for collecting data for both industry and academia. Academically, an established method with an explanation of the detail of the acquisition of tweets can be sourced, referenced, and used as a basis to acknowledge variations on collection method. For practitioners, data collection best practice can be used to inform the internal metrics of the organization, or form the basis for a customized protocol to acquire competitor information, or analyze the company’s performance.

The four methods outlined involve internal Twitter account archives, externally mediated timeline capture through Kwitty, keyword search via Hootsuite, and web capture using NCapture as part of NVivo analysis. Each method is discussed in terms of the variables captured in the data, and the steps needed to perform the capture. All methods should be considered equal in their value to a content classification process. Selection and use of a method should be determined by its relative value to an individual project. It may be that Kwitty’s minimalistic four item data set is more valuable for capturing a personally controlled timeline as it will benchmark against subsequent external timeline captures. Alternatively, NCapture’s depth could suit a project requiring greater nuance and pre-prepared coding than is present in the Twitter Archive data. Table 1 outlines a brief comparison of the four methods

Table 1.
Comparisons of the four collection methods
Internal Timeline ArchiveKwittyHootsuiteNCapture
CostFreeFreeSubscription feesSet up costs
Data Points1041318
TimelinesPersonalThird partySearch resultsThird party / Search

Complete Chapter List

Search this Book: