A New MapReduce Approach with Dynamic Fuzzy Inference for Big Data Classification Problems

A New MapReduce Approach with Dynamic Fuzzy Inference for Big Data Classification Problems

Shangzhu Jin (College of Electrical and Information Engineering, Chongqing University of Science and Technology, Chongqing, China), Jun Peng (College of Electrical and Information Engineering, Chongqing University of Science and Technology, Chongqing, China) and Dong Xie (College of Electrical and Information Engineering, Chongqing University of Science and Technology, Chongqing, China)
DOI: 10.4018/IJCINI.2018070103

Abstract

Currently, big data and its applications have become one of the emergent topics. In practice, MapReduce framework and its different extensions are the most popular approaches for big data. Fuzzy system based models stand out for many applications. However, when a given observation has no overlap with antecedent values, no rule can be invoked in classical fuzzy inference can also appear in big data environment, and therefore no consequence can be derived. Fortunately, fuzzy rule interpolation techniques can support inference in such cases. Combining traditional fuzzy reasoning technique and fuzzy interpolation method may promote the accuracy of inference conclusion. Therefore, in this article, an initial investigation into the framework of MapReduce with dynamic fuzzy inference/interpolation for big data applications (BigData-DFRI) is reported. The results of an experimental investigation of this method are represented, demonstrating the potential and efficacy of the proposed approach.
Article Preview

1. Introduction

Big data is a term for data sets are so large or complex that traditional data processing applications are difficult to deal with this situation. It often refers simply to the use of predictive analytics, user behavior analytics, or certain other advanced data analytics methods that extract value from data, and seldom to a particular size of data set (Madden, 2012; Luo et al., 2015; Wibig 2010). The architecture of big data system has been rebuilt at the storage, processing, and database levels. Which is to allow a data distribution among different computers, supporting parallel access from different disks to increase the speed ratio. With more data available, the analysis and knowledge extraction process should be benefited, and more accurate and precise information should be obtained (Wang & Peng 2017). The frameworks that are typically used to handle big data somehow involve some kind of parallelization so that they can easily process and analyse the data that is ready to be used. One of the most popular platforms for big data purposes nowadays, MapReduce (Dean & Ghemawat, 2004), is a programming model and an associated implementation for processing and generating large data sets. A MapReduce program is composed of two key operations: a map function that will act over a subset of the data, and a reduce function that will integrate the results obtained in the map function.

In order to manage the uncertainty that is coursed by the variety and veracity of big data, there are a few works which address this topic from the perspective of fuzzy modelling. So far, most of the existing approaches adopt the Hadoop MapReduce implementation. The highest effort has been carried out for clustering algorithms, especially for a scalable fuzzy c-means approach (Wasikowski & Chen, 2009; Garg & Trivedi, 2014). The results in terms of purity shown by this model were comparable to other hard and fuzzy clustering techniques (Xu et al., 2015). The fuzzy clustering parallel implementation is also applied to the organization of text documents (Goswami & Shishodia, 2013). To deal with classification tasks, the fuzzy rule-based classification system adapted to the MapReduce scheme named as was proposed (Ro et al., 2015; Baciu et al., 2016). An extension of this implementation was developed in (Lpez et al., 2015). In this work, a cost-sensitive approach was derived from the original approach in order to address classification with imbalanced datasets (Lpez et al., 2013). However, lack of data in the training partitions (Wasikowski & Chen, 2010), also known as rare cases problem, may cause a low density in the problem domain. In these cases, the existing fuzzy big data models are not directly applicable to sparse rule-based big data systems due to their assumption of dense rule bases. Depending on the nature of the rule base either fuzzy inference like compositional rule of inference (CRI) or fuzzy rule interpolation (FRI) may be employed to draw the conclusion. CRI methods rely on a dense rule base in which any observation can find at least a complete or partial matching rule. In many real-world problems, obtaining such a complete rule base is costly or even impractical. Interpolation is more robust when working on sparse rule bases. On the other hand, the resulting interpolated conclusions may be not as accurate as their inferred counterparts if partial matching between a given observation and the rule base can be established. To compensate for the drawbacks of these two techniques, in this paper, an integrated reasoning system so called dynamic fuzzy inference/interpolation for big data (BigData-DFRI) is proposed. An initial investigation into the feasibility of dynamic fuzzy inference/interpolation-based classification system adapted to the MapReduce scheme. The method is also applicable for calculating the crucial missing variables and intermediate variables by using backward fuzzy interpolation (Jin et al., 2014). In so doing, the overall BigData-DFRI model and its reasoning becomes more transparent and interpretable.

Complete Article List

Search this Journal:
Reset
Open Access Articles: Forthcoming
Volume 13: 4 Issues (2019): Forthcoming, Available for Pre-Order
Volume 12: 4 Issues (2018): 3 Released, 1 Forthcoming
Volume 11: 4 Issues (2017)
Volume 10: 4 Issues (2016)
Volume 9: 4 Issues (2015)
Volume 8: 4 Issues (2014)
Volume 7: 4 Issues (2013)
Volume 6: 4 Issues (2012)
Volume 5: 4 Issues (2011)
Volume 4: 4 Issues (2010)
Volume 3: 4 Issues (2009)
Volume 2: 4 Issues (2008)
Volume 1: 4 Issues (2007)
View Complete Journal Contents Listing