Article Preview
TopIntroduction
Generating and sharing of the magnitude of data via public administrations, business, scientific research, numerous industrial and non-profit sectors has increased immeasurably. These data include textual content (i.e. unstructured, semi structured as well as structured (Hashimi et al., 2015), to multimedia content (e.g. audio, images, videos) on a variety of platforms (e.g. sensor networks, system-to-system communications, cyber-physical systems, social media websites and Internet of Things) (Witten et al., 2016). Due to the incessant growth in generating and sharing of data, new and efficient techniques are needed for accessing, discovering the hidden knowledge and sharing the same from various domains (Larose et al., 2014). Human investigation for knowledge extraction of this huge data is a tiresome task and it was found that the obtained results are no longer accurate. The classical algorithms are inaccurate in interpreting and extracting hidden knowledge. So, new and advanced technologies are needed to come into existence to understand the knowledge extraction process automatically and summarize the meaningful information as per the application requirements (Thuraisingham, 2014). Therefore, it is an obligation to design clever and efficient techniques to analyze this massive data. Since 1990’s when data mining techniques have appeared in database family, it is broadly used to extract hidden knowledge and pattern from enormous data sets (Han, 2011). This extraction uses two different techniques, namely supervised and unsupervised techniques (Brownlee, 2016). Clustering is the most used unsupervised and popular data analysis technique in data mining for extracting the hidden knowledge of data by partitioning it into clusters or groups. The ultimate purpose of clustering is to generate the clusters of similar data objects by classifying the unlabeled input data. By doing this, the similarity is to be minimized between the objects of each cluster while the similarity is also to be maximized between objects of other clusters. Hierarchical and Partitional clustering are the two primary categories of the developed numerous clustering algorithms (Jain, 2010). The first category algorithms seek to build a tree structure of cluster in the absence of prior knowledge about the count of initial clusters. In the second category, an initial cluster centroid is assigned. The k-means partitional clustering technique is the widely used and the most prevalent algorithm. This technique effectively groups extensive datasets based on the best runtime. In spite of the fact that the k-means algorithm is quicker than numerous other algorithms, it experiences two note-worthy issues, i.e. exhibiting high sensitivity in the initialization phase and local optima at a low convergence rate (Jain, 2010; Kantardzic, 2011). It has been noticed from the literature (Alam et al., 2014; Esmin et al., 2015; José-García, & Gómez-Flores, 2016; Nanda & Panda, 2014; Saidala & Devarakonda, 2018a) that conjoining the nature–inspired optimization algorithms with standard data clustering techniques will result in accurate solutions. It also enables to overcome the drawbacks found in the standard data clustering techniques.
Table 1. List of uni and multimodal benchmark functions
Function Description | Range | fmin |
| | 0 |
| | 0 |
| | 0 |
| | 0 |
| | 0 |
| | 0 |
| | 0 |
| | - |
| | 0 |
| | 0 |
| | 0 |
| | 0 |
| | 0 |