Unsupervised Clustering for Optimal Locality Detection: A Data Science Approach

Unsupervised Clustering for Optimal Locality Detection: A Data Science Approach

Praneet Amul Akash Cherukuri (CMR Institute of Technology, Hydreabad, India), Bala Sai Allagadda (Malla Reddy College of Engineering and Technology, Hyderabad, India) and Anil Kumar Reddy Konda (CMR Institute of Technology, Hyderabad, India)
DOI: 10.4018/IJHIoT.2021070106
OnDemand PDF Download:
No Current Special Offers


Data science is the most sought over domain in today's world and has been known for its accurate decision-making capabilities, delivering recommendations that have the best profits and much more. The demand for this analysis is the growing technology and population that opens a new dimension of demands leading to the world crisis in every sector. Clustering is the part that helps in making these decisions more accurate and has been evolving through time. Impacts of neighborhoods and localities for businesses are often marked by many factors. To understand the factors and outline them to the proper perspective, through this research the authors performed perspective data cleaning, wrangling, visualization to understand the factors and cluster them for a much prospective decision-making process.
Article Preview


As development progresses in each sector, cutting edge technologies are being adopted in every business. The huge piles of data that has been collected from so many years and the ongoing collection of data have made the work of data science much easier. In our research, the authors are primarily focusing on using the most prominent techniques and concepts of data science to outline the best location for a particular business to be established To make the business successful by outlining some of the features that could be promising in the growth of the business as well. Since most of the research includes data analysis and loads of data manipulation the authors have used pandas as a python library. Pandas give a strong establishment whereupon a ground-breaking information analysis environment can be set up (Mckinney, 2011). Another python library NumPy used for data functions dealing with mathematical functions is also used to analyze different data frames and in the process of merging them. A NumPy exhibit is a Multidimensional, uniform assortment of components. An Array is described by the sort of Elements it contains and by its shape. For instance, a lattice might be spoken to as a variety of shapes (M×N) that contains numbers, e.g., drifting points or complex numbers. In contrast to Matrices, NumPy clusters can have any dimensionality (S. van der Walt, 2011). Folium and Geopy libraries have been used in the better exploration of data and visualization of the locations and venues. K means clustering is the process of clustering similar indexes into similar clusters outlining the data points from the different clusters. Cluster Analysis depends on different sorts of items' disparities and utilizations separation capacities' guidelines to make model Classification (Li Youguo, Wu Haiyan, 2012). The process starts with the methodology understanding, acquisition of data and the transcends to the exploration of data after which the final clustering takes place. Throughout the process, we will be uncovering some of the important ranges of how the business is evaluated, the factors that could help in the increase, or bringing a change in the business. The analysis is done through different libraries of pandas and NumPy that would enable insights that could be visualized using the matplotlib library and hence could clearly understand the scenario. In the latter stages, we will be using one of the most intriguing K means clustering algorithms to uncover the similarities in a given neighborhood under to accomplish the identification of the best locality for the business place. Throughout the analysis, the main aim of the research would be to uncover how data science techniques could be entangled to find insights that could help in the real growth of a business and could suggest the optimal solutions for a particular business in their scenarios.


The methodology identifies the major part of intuiting the process for the research by outlining it into the following different sections. It outlines how the preprocessing of the research takes place and following the analysis part of the process. Finding the best locality for a particular business when taken into account, could be filled up with multiple criteria and dimensions the authors need to look into. Hence in the scenario of gyms, after a sheer analysis of the important factors that could affect the business of Gym or Fitness centers in particular

Data Acquisition

Data acquisition is a very important aspect as the problem statement in the research is greatly linked to the statistics of the people and their data in that particular city or neighborhood. The prospective data of a particular business and its take away points are linked with the metrics that matter in the improvement of the business. After analyzing different metrics that take a huge part in the development of fitness centers, three main factors were outlined. The three factors that were taken into consideration are the average household income in a particular neighborhood, the population of people living in that city that are aged between 15 - 55, and lastly the number of gyms that are already present in a given neighborhood that could act as a competition. All the data that the authors are trying to acquire is to understand the factors that influence the business in deep sectors to suggest the best outcomes for that particular sector.

Complete Article List

Search this Journal:
Open Access Articles: Forthcoming
Volume 6: 2 Issues (2022): 1 Released, 1 Forthcoming
Volume 5: 2 Issues (2021)
Volume 4: 2 Issues (2020)
Volume 3: 2 Issues (2019)
Volume 2: 2 Issues (2018)
Volume 1: 2 Issues (2017)
View Complete Journal Contents Listing