Improving the K-Means Clustering Algorithm Oriented to Big Data Environments

Improving the K-Means Clustering Algorithm Oriented to Big Data Environments

Joaquín Pérez Ortega, Nelva Nely Almanza Ortega, Andrea Vega Villalobos, Marco A. Aguirre L., Crispín Zavala Díaz, Javier Ortiz Hernandez, Antonio Hernández Gómez
ISBN13: 9781799847304|ISBN10: 1799847306|EISBN13: 9781799847311
DOI: 10.4018/978-1-7998-4730-4.ch013
Cite Chapter Cite Chapter

MLA

Pérez Ortega, Joaquín, et al. "Improving the K-Means Clustering Algorithm Oriented to Big Data Environments." Handbook of Research on Natural Language Processing and Smart Service Systems, edited by Rodolfo Abraham Pazos-Rangel, et al., IGI Global, 2021, pp. 289-308. https://doi.org/10.4018/978-1-7998-4730-4.ch013

APA

Pérez Ortega, J., Almanza Ortega, N. N., Vega Villalobos, A., Aguirre L., M. A., Zavala Díaz, C., Ortiz Hernandez, J., & Hernández Gómez, A. (2021). Improving the K-Means Clustering Algorithm Oriented to Big Data Environments. In R. Pazos-Rangel, R. Florencia-Juarez, M. Paredes-Valverde, & G. Rivera (Eds.), Handbook of Research on Natural Language Processing and Smart Service Systems (pp. 289-308). IGI Global. https://doi.org/10.4018/978-1-7998-4730-4.ch013

Chicago

Pérez Ortega, Joaquín, et al. "Improving the K-Means Clustering Algorithm Oriented to Big Data Environments." In Handbook of Research on Natural Language Processing and Smart Service Systems, edited by Rodolfo Abraham Pazos-Rangel, et al., 289-308. Hershey, PA: IGI Global, 2021. https://doi.org/10.4018/978-1-7998-4730-4.ch013

Export Reference

Mendeley
Favorite

Abstract

In recent years, the amount of texts in natural language, in digital format, has had an impressive increase. To obtain useful information from a large volume of data, new specialized techniques and efficient algorithms are required. Text mining consists of extracting meaningful patterns from texts; one of the basic approaches is clustering. The most used clustering algorithm is k-means. This chapter proposes an improvement of the k-means algorithm in the convergence step; the process stops whenever the number of objects that change their assigned cluster in the current iteration is bigger than the ones that changed in the previous iteration. Experimental results showed a reduction in execution time up to 93%. It is remarkable that, in general, better results are obtained when the volume of the text increase, particularly in those texts within big data environments.

Request Access

You do not own this content. Please login to recommend this title to your institution's librarian or purchase it from the IGI Global bookstore.