Reference Hub4
Text Mining and Pre-Processing Methods for Social Media Data Extraction and Processing

Text Mining and Pre-Processing Methods for Social Media Data Extraction and Processing

Santoshi Kumari
ISBN13: 9781799895947|ISBN10: 1799895947|ISBN13 Softcover: 9781799895954|EISBN13: 9781799895961
DOI: 10.4018/978-1-7998-9594-7.ch002
Cite Chapter Cite Chapter

MLA

Kumari, Santoshi. "Text Mining and Pre-Processing Methods for Social Media Data Extraction and Processing." Handbook of Research on Opinion Mining and Text Analytics on Literary Works and Social Media, edited by Pantea Keikhosrokiani and Moussa Pourya Asl, IGI Global, 2022, pp. 22-53. https://doi.org/10.4018/978-1-7998-9594-7.ch002

APA

Kumari, S. (2022). Text Mining and Pre-Processing Methods for Social Media Data Extraction and Processing. In P. Keikhosrokiani & M. Pourya Asl (Eds.), Handbook of Research on Opinion Mining and Text Analytics on Literary Works and Social Media (pp. 22-53). IGI Global. https://doi.org/10.4018/978-1-7998-9594-7.ch002

Chicago

Kumari, Santoshi. "Text Mining and Pre-Processing Methods for Social Media Data Extraction and Processing." In Handbook of Research on Opinion Mining and Text Analytics on Literary Works and Social Media, edited by Pantea Keikhosrokiani and Moussa Pourya Asl, 22-53. Hershey, PA: IGI Global, 2022. https://doi.org/10.4018/978-1-7998-9594-7.ch002

Export Reference

Mendeley
Favorite

Abstract

A huge amount of unstructured data is generated from social media platforms like Twitter. Volume of tweets and the velocity with which they are generated on various topics presents extensive challenges in data analytics and processing techniques. Linguistic flexibility for writing tweets presents many challenges in preprocessing and natural language processing tasks. Addressing these challenges, this chapter aims to select, modify, and apply information retrieval and preprocessing steps for retrieving, storing, organizing, and cleaning real-time large-scale unstructured Twitter data. The work focuses on reviewing the previous research and applying suitable preprocessing methods to improve the quality of data by removing unessential data. It is also observed that using tweeter APIs and access tokens provides easy access to real-time tweets. Preprocessing methods are fundamental steps of text analytics and NLP tasks to process unstructured data. Analyzing suitable preprocessing methods like tokenization, removal of stop word, stemming, and lemmatization are applied to normalize the extracted Twitter data.

Request Access

You do not own this content. Please login to recommend this title to your institution's librarian or purchase it from the IGI Global bookstore.