Hershey, Pennsylvania

New York, New YorkBeijing, China

Special Offers
- Up to 50% off Thousands of Research Books
  From July 1st through October 31st, 2025, we are offering discounts of up to 50% across thousands of titles in Business & Management; Science, Technology, & Medicine; and Education & Social Sciences. Through this campaign, we’re committed to ensuring that our mutual library customers worldwide can continue to access high-quality, peer-reviewed content during these challenging times. If this campaign is successful, we will extend through the end of the year and beyond if there’s a benefit to all parties involved. When hosted on the InfoSci^® Platform, e-books feature no DRM, no additional cost for unlimited-user licensing, full-text PDF & HTML formats, and more. Discount is automatically added at checkout.
  Browse Titles
- IGI Global Scientific Publishing Launches International Brand Ambassador Program
  IGI Global Scientific Publishing has launched a new Ambassador Program, designed to empower research professionals to help spread scholarly resources and foster global research engagement. As a local, mid-sized publisher, this initiative offers IGI Global Scientific Publishing an exciting opportunity to expand its global presence in the academic community and foster meaningful connections among scholars around the world. With currently over 130 ambassadors worldwide, these scholarly experts are dedicated to supporting the publisher’s initiative of disseminating cutting-edge research.
  Learn More
- Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 20 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no hosting or maintenance fees, no additional cost for unlimited-user licensing, full-text PDF & HTML format, and more.
  Learn More
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education & Social Sciences
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education & Social Sciences
  - Journals by Field
e-Collections
OnDemand
Open Access
- View All Open Access Opportunities
  Search across all available IGI Global Scientific Publishing open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all available IGI Global Scientific Publishing open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through the IGI Global Scientific Publishing Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global Scientific Publishing to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open access endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global Scientific Publishing to publish your work under open access? Review the IGI Global Scientific Publishing open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us

Revealing Groups of Semantically Close Textual Documents by Clustering: Problems and Possibilities

František Dařena (Mendel University in Brno, Czech Republic) and Jan Žižka (Mendel University in Brno, Czech Republic)

Source Title: Modern Computational Models of Semantic Discovery in Natural Language

DOI: 10.4018/978-1-4666-8690-8.ch004

OnDemand:

(Individual Chapters)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

The chapter introduces clustering as a family of algorithms that can be successfully used to organize text documents into groups without prior knowledge of these groups. The chapter also demonstrates using unsupervised clustering to group large amount of unlabeled textual data (customer reviews written informally in five natural languages) so it can be used later for further analysis. The attention is paid to the process of selecting clustering algorithms, their parameters, methods of data preprocessing, and to the methods of evaluating the results by a human expert with an assistance of computers, too. The feasibility has been demonstrated by a number of experiments with external evaluation using known labels and expert validation with an assistance of a computer. It has been found that it is possible to apply the same procedures, including clustering, cluster validation, and detection of topics and significant words for different natural languages with satisfactory results.

Chapter Preview

Top

Introduction

People and companies have many opportunities to express their opinions related to a wide variety of topics. The media used for such communication include personal web pages and blogs, social networks, discussion boards, e-mail, instant messages, and others. Various subjects can benefit from a high availability of information, which also demands bigger involvement, knowledge, information processing and decision making skills. Due to huge volumes of data that is often freely available for many different subjects there is a need for approaches that enable to use the data for decision making. Since most of the data is available in an unstructured textual form, disciplines focusing on this type of data have gained on their significance during the last few years (Miner at al., 2012).

Because of inadequate time and effort that would be needed in order to reveal the knowledge hidden in the data, the processing cannot be often done manually by humans. Instead, the application of computer based automated methods is a more desirable choice. This is enabled by the availability of increased computational speed and memory sizes of ordinary computers as well as by the development of new algorithms that are able to address various needs and problems. Instead of a traditional methodology employing human operators for reading the documents, statistical analysis, and data mining techniques based on the non-linguistic structure of the documents (Dini & Mazzini, 2010), intelligent computer-based analysis called text mining might arrive at new and unforeseen results.

Text mining is a branch of computer science that uses techniques from data mining, information retrieval, machine learning, statistics, natural language processing, and knowledge management (Berry & Kogan, 2010). The greatest potential of text mining applications is in the areas where large quantities of textual data are generated and collected. These areas include, besides others, categorization of newspaper articles or web pages, e-mail filtering, organization of a library, customer complaints (or feedback) handling, marketing focus group programs, competitive intelligence, market prediction, extraction of topic trends in text streams, discovering semantic relations between events, or customer satisfaction analysis (Cao et al., 2014; Koteswara Rao & Dey, 2011; Miner at al., 2012; Nassirtoussi, 2014; Weiss et al., 2010). Text mining involves tasks such as text categorization, term extraction, single- or multi-document document summarization, clustering, association rules mining, or sentiment analysis (Feldman & Sanger, 2007).

At the end of the last century, machine learning gained on its popularity and became a dominant approach to text mining (Sebastiani, 2002). Machine learning is a discipline that focuses on modification or adaptation of computer behavior based on the past experience (the data in this case) so the behavior gets better in the future. Such an adaptation depends on whether there is the right behavior specified. If there is, it means that there is a set of examples with correct answers (actions) provided. In this case we talk about supervised learning. During the learning process a computer tries to generalize the knowledge to be able to react correctly to all, even previously unseen inputs. When the correct responses are not provided, a computer tries to find some patterns based on similarities between the inputs. This approach is known as unsupervised learning (Marsland, 2009). The common goal of both approaches is to achieve accuracy comparable to that achieved by human experts.

Complete Chapter List

Search this Book:

Reset

MLA

APA

Chicago

Export Reference

Revealing Groups of Semantically Close Textual Documents by Clustering: Problems and Possibilities

Abstract

Introduction

Complete Chapter List