Improving the K-Means Clustering Algorithm Oriented to Big Data Environments

Joaquín Pérez Ortega, Nelva Nely Almanza Ortega, Andrea Vega Villalobos, Marco A. Aguirre L., Crispín Zavala Díaz, Javier Ortiz Hernandez, Antonio Hernández Gómez

Source Title: Handbook of Research on Natural Language Processing and Smart Service Systems

ISBN13: 9781799847304|ISBN10: 1799847306|EISBN13: 9781799847311

DOI: 10.4018/978-1-7998-4730-4.ch013

Cite Chapter Cite Chapter

MLA

Pérez Ortega, Joaquín, et al. "Improving the K-Means Clustering Algorithm Oriented to Big Data Environments." Handbook of Research on Natural Language Processing and Smart Service Systems, edited by Rodolfo Abraham Pazos-Rangel, et al., IGI Global, 2021, pp. 289-308. https://doi.org/10.4018/978-1-7998-4730-4.ch013

APA

Pérez Ortega, J., Almanza Ortega, N. N., Vega Villalobos, A., Aguirre L., M. A., Zavala Díaz, C., Ortiz Hernandez, J., & Hernández Gómez, A. (2021). Improving the K-Means Clustering Algorithm Oriented to Big Data Environments. In R. Pazos-Rangel, R. Florencia-Juarez, M. Paredes-Valverde, & G. Rivera (Eds.), Handbook of Research on Natural Language Processing and Smart Service Systems (pp. 289-308). IGI Global. https://doi.org/10.4018/978-1-7998-4730-4.ch013

Chicago

Pérez Ortega, Joaquín, et al. "Improving the K-Means Clustering Algorithm Oriented to Big Data Environments." In Handbook of Research on Natural Language Processing and Smart Service Systems, edited by Rodolfo Abraham Pazos-Rangel, et al., 289-308. Hershey, PA: IGI Global, 2021. https://doi.org/10.4018/978-1-7998-4730-4.ch013

Export Reference

Favorite

View Full Text HTML

View Full Text PDF

Abstract

In recent years, the amount of texts in natural language, in digital format, has had an impressive increase. To obtain useful information from a large volume of data, new specialized techniques and efficient algorithms are required. Text mining consists of extracting meaningful patterns from texts; one of the basic approaches is clustering. The most used clustering algorithm is k-means. This chapter proposes an improvement of the k-means algorithm in the convergence step; the process stops whenever the number of objects that change their assigned cluster in the current iteration is bigger than the ones that changed in the previous iteration. Experimental results showed a reduction in execution time up to 93%. It is remarkable that, in general, better results are obtained when the volume of the text increase, particularly in those texts within big data environments.

You do not own this content. Please login to recommend this title to your institution's librarian or purchase it from the IGI Global bookstore.

Username or email: *

Password: *

Forgot individual login password?

Create individual account

Improving the K-Means Clustering Algorithm Oriented to Big Data Environments

MLA

APA

Chicago

Export Reference

Abstract

Request Access