Big Data and Web Intelligence: Improving the Efficiency on Decision Making Process via BDD

Big Data and Web Intelligence: Improving the Efficiency on Decision Making Process via BDD

Alberto Pliego (Escuela Técnica Superior de Ingenieros Industriales, Spain) and Fausto Pedro García Márquez (Escuela Técnica Superior de Ingenieros Industriales, Spain)
DOI: 10.4018/978-1-4666-8505-5.ch010
OnDemand PDF Download:


The growing amount of available data generates complex problems when they need to be treated. Usually these data come from different sources and inform about different issues, however, in many occasions these data can be interrelated in order to gather strategic information that is useful for Decision Making processes in multitude of business. For a qualitatively and quantitatively analysis of a complex Decision Making process is critical to employ a correct method due to the large number of operations required. With this purpose, this chapter presents an approach employing Binary Decision Diagram applied to the Logical Decision Tree. It allows addressing a Main Problem by establishing different causes, called Basic Causes and their interrelations. The cases that have a large number of Basic Causes generate important computational costs because it is a NP-hard type problem. Moreover, this chapter presents a new approach in order to analyze big Logical Decision Trees. However, the size of the Logical Decision Trees is not the unique factor that affects to the computational cost but the procedure of resolution can widely vary this cost (ordination of Basic Causes, number of AND/OR gates, etc.) A new approach to reduce the complexity of the problem is hereby presented. It makes use of data derived from simpler problems that requires less computational costs for obtaining a good solution. An exact solution is not provided by this method but the approximations achieved have a low deviation from the exact.
Chapter Preview


The information and communication technologies (ICT) have grown up with no precedents, and all aspects of human life have been transformed under this new scenario. All industrial sectors have rapidly incorporated the new technologies, and some of them have become de facto standards like supervisory control and data acquisition (SCADA) systems. Huge large amounts of data started to be created, processed and saved, allowing an automatic control of complex industrial systems. In spite of this progress, there are some challenges not well addressed yet. Some of them are: the analysis of tons of data, as well as continuous data streams; the integration of data in different formats coming from different sources; making sense of data to support decision making; and getting results in short periods of time. These all are characteristics of a problem that should be addressed through a big data approach.

Even though Big Data has become one of the most popular buzzword, the industry has evolved towards a definition around this term on the base of three dimensions: volume, variety and velocity (Zikopoulos and Eaton, 2011).

Data volume is normally measured by the quantity of raw transactions, events or amount of history that creates the data volume. Typically, data analysis algorithms have used smaller data sets called training sets to create predictive models. Most of the times, the business use predictive insight that are severely gross since the data volume has purposely been reduced according to storage and computational processing constraints. By removing the data volume constraint and using larger data sets, it is possible to discover subtle patterns that can lead to targeted actionable decisions, or they can enable further analysis that increase the accuracy of the predictive models.

Data variety came into existence over the past couple of decades, when data has increasingly become unstructured as the sources of data have proliferated beyond operational applications. In industrial applications, such variety emerged from the proliferation of multiple types of sensors, which enable the tracking of multiple variables in almost every domain in the world. Most technical factors include sampling rate of data and their relative range of values.

Data velocity is about the speed at which data is created, accumulated, ingested, and processed. An increasing number of applications are required to process information in real-time or with near real-time responses. This may imply that data is processed on the fly, as it is ingested, to make real-time decisions, or schedule the appropriate tasks.

However as other authors point out, Big Data could be also classified according to other dimensions such as veracity, validity and volatility.

Data veracity is about the certainty of data meaning. This feature express whether data reflect properly the reality or not. It depends on the way in which data are collected. It is strongly linked to the credibility of sources. For example the veracity of the data collected from sensors depends on the calibration of sensors. The data collected from surveys could be truthful if survey samples are large enough to provide a sufficient basis for analysis. In resume, the massive amounts of data collected for Big Data purposes can lead to statistical errors and misinterpretation of the collected information. Purity of the information is critical for value (Ohlhorst, 1964).

Data validity is about the accuracy of data. The validity of Big Data sources must be accurate if results are wanted to be used for decision making or any other reasonable purpose (Hurwitz et al, 2013)

Data volatility is about how long the data need to be storage. Some difficulties could appear due to the storage capacity. If storage is limited, what and how long data is needed to be kept. With some Big Data sources, it could be necessary to gather the data for a quick analysis (Hurwitz et al, 2013).

These data are often used for decision making. DM processes are done continuously by any firm in order to maximize the profits reliability, etc. or minimize costs, risks, etc. There are software to facilitate this task, but the main problem is the capability for providing a quantitative solution when the case study has a large number of BCs. The DM problem is considered as a cyclic process in which the decision maker can evaluate the consequences of a previous decision. Figure 1 shows the normal process to solve a DM problem.

Key Terms in this Chapter

Non-Deterministic Polynomial-Time Hard Problem (NP-Hard): The NP-hard problems are a class of problems that are at least as hard as the hardest problem in NP. If there were a polynomial algorithm to solve any NP-Hard problem, it could be used to solve any NP problem.

Binary Decision Diagram (BDD): A BDD is a directed acyclic graph (DAG) that simulates a logical function. The main advantage of the BDDs is the possibility of evaluating the top event using implicit formulas.

Top Event: This is the event placed at the highest level of the LDT. It represents the main cause, or the success that is pretended to be studied.

Cut-Sets (CSs): The CSs of a BDD are the paths from the root node to the terminal nodes with value 1. They represent the series of events that have to occur so that the top event occurs. The size of a BDD can be represented by the number of CSs forming it. The probability of the top event is the sum of the probabilities of the CSs.

Ranking Methods: The efficiency of the conversion from LDT to BDD depends strongly on the ordination of the basic events of the LDT. With this purpose there are some heuristic algorithms that try to order these events. There is not a unique method that provides the best ordination for all the cases so different methods need to be considered when a conversion is required.

Logical Decision Tree (LDT): It is a graphical representation of a structure function. A LDT structure consists of a root node (top event) that is broken down into various nodes located below it, where the nodes can be events, logical gates and branches. It represents the interrelations between the basics events that form a more complex event.

Basic Events: The basic events are logical variables that adopt two possible states: 1 if the basic event occurs and 0 if it does not occur. They can be associated to a component of a system, a success, the cause of a problem, etc…

Complete Chapter List

Search this Book: