Text Preprocessing: A Tool of Information Visualization and Digital Humanities

Text Preprocessing: A Tool of Information Visualization and Digital Humanities

Piotr Malak
ISBN13: 9781522549901|ISBN10: 1522549900|EISBN13: 9781522549918
DOI: 10.4018/978-1-5225-4990-1.ch006
Cite Chapter Cite Chapter

MLA

Malak, Piotr. "Text Preprocessing: A Tool of Information Visualization and Digital Humanities." Information Visualization Techniques in the Social Sciences and Humanities, edited by Veslava Osinska and Grzegorz Osinski, IGI Global, 2018, pp. 86-104. https://doi.org/10.4018/978-1-5225-4990-1.ch006

APA

Malak, P. (2018). Text Preprocessing: A Tool of Information Visualization and Digital Humanities. In V. Osinska & G. Osinski (Eds.), Information Visualization Techniques in the Social Sciences and Humanities (pp. 86-104). IGI Global. https://doi.org/10.4018/978-1-5225-4990-1.ch006

Chicago

Malak, Piotr. "Text Preprocessing: A Tool of Information Visualization and Digital Humanities." In Information Visualization Techniques in the Social Sciences and Humanities, edited by Veslava Osinska and Grzegorz Osinski, 86-104. Hershey, PA: IGI Global, 2018. https://doi.org/10.4018/978-1-5225-4990-1.ch006

Export Reference

Mendeley
Favorite

Abstract

Digital humanities and information visualization rely on huge sets of digital data. Those data are mostly delivered in the text form. Although computational linguistics provides a lot of valuable tools for text processing, the initial phase (text preprocessing) is very involved and time-consuming. The problems arise due to a human factor – they are not always errors; there is also inconsistency in forms, affecting data quality. In this chapter, the author describes and discusses the main issues that arise during the preprocessing phase of textual data gathering for InfoVis. Chosen examples of InfoVis applications are presented. Except for problems with raw, original data, solutions are also referred. Canonical approaches used in text preprocessing and common issues affecting the process and ways to prevent them are also presented. The quality of data from different sources is also discussed. The content of this chapter is a result of a few years of practical experience in natural language processing gained during realization of different projects and evaluation campaigns.

Request Access

You do not own this content. Please login to recommend this title to your institution's librarian or purchase it from the IGI Global bookstore.