A Novel Approach With Dynamic Fuzzy Inference for Big Data Classification Problems

A Novel Approach With Dynamic Fuzzy Inference for Big Data Classification Problems

Shangzhu Jin (School of Electronic Information Engineering, Chongqing University of Science and Technology, China) and Jun Peng (School of Electronic Information Engineering, Chongqing University of Science and Technology, China)
DOI: 10.4018/978-1-7998-3038-2.ch003

Abstract

Currently, big data and its applications have become emergent topics. To deal with the uncertainty in data sets, fuzzy system-based models were explored and stand out for many applications. However, when a given observation has no overlap with antecedent values, no rule can be invoked, or even the invoked rules with missing values in classical fuzzy inference can also appear in big data environment, and therefore, no consequence can be derived. Fortunately, fuzzy rule interpolation techniques can support inference in such cases. Combining traditional fuzzy reasoning technique and fuzzy interpolation method may promote the accuracy of inference conclusion. Therefore, in this chapter, an initial investigation into the framework of MapReduce with dynamic fuzzy inference/interpolation for big data applications (BigData-DFRI) is reported. The results of an experimental investigation of this method are represented, demonstrating the potential and efficacy of the proposed approach.
Chapter Preview
Top

Introduction

Big data is a term for data sets are so large or complex that traditional data processing applications are difficult to deal with this situation. It often refers simply to the use of predictive analytics, user behavior analytics, or certain other advanced data analytics methods that extract value from data, and seldom to a particular size of data set (Madden, 2012; Luo et al., 2015; Wibig 2010). The architecture of big data system has been rebuild at the storage, processing, and database levels. Which is to allow a data distribution among different computers, supporting parallel access from different disks to increase the speed ratio. With more data available, the analysis and knowledge extraction process should be benefited, and more accurate and precise information should be obained (Wang & Peng 2017). The frameworks that are typically used to handle big data somehow involve some kind of parallelization so that they can easily process and analyse the data that is ready to be used. One of the most popular platforms for big data purposes nowadays, MapReduce (Dean & Ghemawat, 2004), is a programming model and an associated implementation for processing and generating large data sets. A MapReduce program is composed of two key operations: a map function that will act over a subset of the data, and a reduce function that will integrate the results obtained in the map function.

In order to manage the uncertainty that is coursed by the variety and veracity of big data, there are a few works which address this topic from the perspective of fuzzy modelling. So far, most of the existing approaches adopt the Hadoop MapReduce implementation. The highest effort has been carried out for clustering algorithms, especially for an scalable fuzzy c-means approach (Wasikowski & Chen, 2009; Garg & Trivedi, 2014). The results in terms of purity shown by this model were comparable to other hard and fuzzy clustering techniques (Xu et al., 2015). The fuzzy clustering parallel implementation is also applied to the organization of text documents (Goswami & Shishodia, 2013). To deal with classification tasks, the fuzzy rule based classification system adapted to the MapReduce scheme named as was proposed (Ro et al., 2015; Baciu et al., 2016). An extension of this implementation was developed in (Lpez et al., 2015). In this work, a cost-sensitive approach was derived from the original approach in order to address classification with imbalanced datasets (Lpez et al., 2013). However, lack of data in the training partitions (Wasikowski & Chen, 2010), also known as rare cases problem, may cause a low density in the problem domain. In these cases, the existing fuzzy big data models are not directly applicable to sparse rule-based big data systems due to their assumption of dense rule bases. Depending on the nature of the rule base either fuzzy inference like compositional rule of inference (CRI) or fuzzy rule interpolation (FRI) may be employed to draw the conclusion. CRI methods rely on a dense rule base in which any observation can find at least a complete or partial matching rule. In many real-world problems, obtaining such a complete rule base is costly or even impractical. Interpolation is more robust when working on sparse rule bases. On the other hand, the resulting interpolated conclusions may be not as accurate as their inferred counterparts if partial matching between a given observation and the rule base can be established. To compensate for the drawbacks of these two techniques, in this paper, an integrated reasoning system so called dynamic fuzzy inference/interpolation for big data (BigData-DFRI) is proposed. An initial investigation into the feasibility of dynamic fuzzy inference/interpolation based classification system adapted to the MapReduce scheme. The method is also applicable for calculating the crucial missing variables and intermediate variables by using backward fuzzy interpolation (Jin et al., 2014). In so doing, the overall BigData-DFRI model and its reasoning becomes more transparent and interpretable.

Complete Chapter List

Search this Book:
Reset