Data Preparation for Big Data Analytics: Methods and Experiences

Data Preparation for Big Data Analytics: Methods and Experiences

Andreas Schmidt, Martin Atzmueller, Martin Hollender
Copyright: © 2016 |Pages: 14
DOI: 10.4018/978-1-5225-0293-7.ch010
(Individual Chapters)
No Current Special Offers


This chapter provides an overview of methods for preprocessing structured and unstructured data in the scope of Big Data. Specifically, this chapter summarizes according methods in the context of a real-world dataset in a petro-chemical production setting. The chapter describes state-of-the-art methods for data preparation for Big Data Analytics. Furthermore, the chapter discusses experiences and first insights in a specific project setting with respect to a real-world case study. Furthermore, interesting directions for future research are outlined.
Chapter Preview


Know-how about the production process is crucial, especially in case the production facility reaches an unexpected operation mode such as a critical situation. When the production facility is about to reach a critical state, the amount of information (so called shower of alarms) can be overwhelming for the facility operator, eventually leading to loss of control, production outage and defects in the production facility. This is not only expensive for the manufacturer but can also be a threat to humans and the environment. Therefore, it is important to support the facility operator in a critical situation with an assistant system using real-time analytics and ad-hoc decision support.

The objective of the BMBF-funded research project “Frühzeitige Erkennung und Entscheidungsunterstützung für kritische Situationen im Produktionsumfeld”1 (short FEE) is to detect critical situations in production environments as early as possible and to support the facility operator with a warning or even a recommendation how to handle this particular situation. This enables the operator to act proactively, i.e., before the alarm happens, instead of just reacting to alarms.

The consortium of the FEE project consists of several partners, including application partners from the chemical industry. These partners provide use cases for the project and background knowledge about the production process, which is important for designing analytical methods. The available data was collected in a petrochemical plant over many years and includes a variety of data from different sources such as sensor data, alarm logs, engineering- and asset data, data from the process-information-management-system as well as unstructured data extracted from operation journals and operation instructions (see Figure 1). Thus, the dataset consists of various different document types. Unstructured / textual data is included as part of the operation instructions and operation journals. Knowledge about the process dependencies is provided as a part of cause-effect-tables. Information about the production facility is included in form of flow process charts. Furthermore, there is information about alarm logs and sensor values coming directly from the processing line.

Figure 1.

In the FEE project, various data sources from a petrochemical plant are preprocessed and consolidated in a big data analytics platform in order to proactively support the operator with an assistant system for an automatic early warning


Complete Chapter List

Search this Book: