Revealing Groups of Semantically Close Textual Documents by Clustering: Problems and Possibilities

Source Title: Artificial Intelligence: Concepts, Methodologies, Tools, and Applications

ISBN13: 9781522517597|ISBN10: 1522517596|EISBN13: 9781522517603

DOI: 10.4018/978-1-5225-1759-7.ch081

MLA

Dařena, František, and Jan Žižka. "Revealing Groups of Semantically Close Textual Documents by Clustering: Problems and Possibilities." Artificial Intelligence: Concepts, Methodologies, Tools, and Applications, edited by Information Resources Management Association, IGI Global, 2017, pp. 1981-2020. https://doi.org/10.4018/978-1-5225-1759-7.ch081

APA

Dařena, F. & Žižka, J. (2017). Revealing Groups of Semantically Close Textual Documents by Clustering: Problems and Possibilities. In I. Management Association (Ed.), Artificial Intelligence: Concepts, Methodologies, Tools, and Applications (pp. 1981-2020). IGI Global. https://doi.org/10.4018/978-1-5225-1759-7.ch081

Chicago

Dařena, František, and Jan Žižka. "Revealing Groups of Semantically Close Textual Documents by Clustering: Problems and Possibilities." In Artificial Intelligence: Concepts, Methodologies, Tools, and Applications, edited by Information Resources Management Association, 1981-2020. Hershey, PA: IGI Global, 2017. https://doi.org/10.4018/978-1-5225-1759-7.ch081

Export Reference

Favorite

View Full Text HTML

View Full Text PDF

Abstract

The chapter introduces clustering as a family of algorithms that can be successfully used to organize text documents into groups without prior knowledge of these groups. The chapter also demonstrates using unsupervised clustering to group large amount of unlabeled textual data (customer reviews written informally in five natural languages) so it can be used later for further analysis. The attention is paid to the process of selecting clustering algorithms, their parameters, methods of data preprocessing, and to the methods of evaluating the results by a human expert with an assistance of computers, too. The feasibility has been demonstrated by a number of experiments with external evaluation using known labels and expert validation with an assistance of a computer. It has been found that it is possible to apply the same procedures, including clustering, cluster validation, and detection of topics and significant words for different natural languages with satisfactory results.

You do not own this content. Please login to recommend this title to your institution's librarian or purchase it from the IGI Global bookstore.

Username or email: *

Password: *

Forgot individual login password?

Create individual account

Revealing Groups of Semantically Close Textual Documents by Clustering: Problems and Possibilities

MLA

APA

Chicago

Export Reference

Abstract

Request Access