Text Preprocessing: A Tool of Information Visualization and Digital Humanities

Piotr Malak

Source Title: Information Visualization Techniques in the Social Sciences and Humanities

ISBN13: 9781522549901|ISBN10: 1522549900|EISBN13: 9781522549918

DOI: 10.4018/978-1-5225-4990-1.ch006

MLA

Malak, Piotr. "Text Preprocessing: A Tool of Information Visualization and Digital Humanities." Information Visualization Techniques in the Social Sciences and Humanities, edited by Veslava Osinska and Grzegorz Osinski, IGI Global, 2018, pp. 86-104. https://doi.org/10.4018/978-1-5225-4990-1.ch006

APA

Malak, P. (2018). Text Preprocessing: A Tool of Information Visualization and Digital Humanities. In V. Osinska & G. Osinski (Eds.), Information Visualization Techniques in the Social Sciences and Humanities (pp. 86-104). IGI Global. https://doi.org/10.4018/978-1-5225-4990-1.ch006

Chicago

Malak, Piotr. "Text Preprocessing: A Tool of Information Visualization and Digital Humanities." In Information Visualization Techniques in the Social Sciences and Humanities, edited by Veslava Osinska and Grzegorz Osinski, 86-104. Hershey, PA: IGI Global, 2018. https://doi.org/10.4018/978-1-5225-4990-1.ch006

Export Reference

Favorite

View Full Text HTML

View Full Text PDF

Abstract

Digital humanities and information visualization rely on huge sets of digital data. Those data are mostly delivered in the text form. Although computational linguistics provides a lot of valuable tools for text processing, the initial phase (text preprocessing) is very involved and time-consuming. The problems arise due to a human factor – they are not always errors; there is also inconsistency in forms, affecting data quality. In this chapter, the author describes and discusses the main issues that arise during the preprocessing phase of textual data gathering for InfoVis. Chosen examples of InfoVis applications are presented. Except for problems with raw, original data, solutions are also referred. Canonical approaches used in text preprocessing and common issues affecting the process and ways to prevent them are also presented. The quality of data from different sources is also discussed. The content of this chapter is a result of a few years of practical experience in natural language processing gained during realization of different projects and evaluation campaigns.

You do not own this content. Please login to recommend this title to your institution's librarian or purchase it from the IGI Global bookstore.

Username or email: *

Password: *

Forgot individual login password?

Create individual account

Text Preprocessing: A Tool of Information Visualization and Digital Humanities

MLA

APA

Chicago

Export Reference

Abstract

Request Access