Advancements in sensors and database technologies have resulted in the collection of huge amounts of process data from chemical plants. A number of process quantities such as temperature, pressure, flow rates, level, composition, and pH can be easily measured. Chemical processes are dynamic systems and are equipped with hundreds or thousands of sensors that generate readings at regular intervals (typically seconds). In addition, derived quantities that are functions of the sensor measurements as well as alerts and alarms are generated regularly. Several commercial data warehouses, referred to as plant historians in chemical plants, such as the DeltaV Continuous Historian (from Emerson Process Management), InfoPlus.21TM (from AspenTech), Uniformance® PHD (from Honeywell), and Industrial SQL (from Wonderware) are in common use today around the world. These historians store large amount (weeks) of historical process operation data at their original resolution and an almost limitless amount (years) in compressed form. This data is available for mining, analysis and decision support – both real-time and offline. Process measurements can be classified based on their nature as binary (on/off) or continuous. However, both are stored in discrete form in the historians. Measurements can also be classified based on their role during operation as controlled, manipulated, and non-control related variables. Controlled variables are directly or indirectly related to the plant’s quality, production, or safety objectives and are maintained at specified setpoints, even in the face of disturbances, by analog or digital controllers. This regulation is achieved by altering manipulated variables such as flow-rates. Chemical plants are typically well-integrated – a change in one variable would propagate across many others. Non-control related variables do not have any role in plant control, but provide information to plant personnel regarding the state of the process. In general, a plant can operate in a number of states which can be broadly classified into steady-states and transitions (Srinivasan et al., 2005b). Large scale plants such as refineries typically run for long periods in steady-states but undergo transition if there is a change in feedstock or product grades. Transitions also result due to large process disturbances, maintenance activities, and abnormal events. During steady-states, the process variables vary within a narrow range. In contrast, transitions correspond to large changes / discontinuities in the plant operations; i.e., change of set points, turning on or idling of equipments, valve manipulations, etc. A number of decisions are needed on the part of the plant personnel to keep the plant running safely and efficiently during steady states as well as transitions. Data mining and analysis tools that facilitate humans to uncover information, knowledge, patterns, trends, and relationships from the historical data are therefore crucial.
Numerous challenges bedevil the mining of data generated by chemical processes. These arise from the following general characteristics of the data:
Temporal: Since the chemical process is a dynamic system, all measurements vary with time.
Noisy: The sensors and therefore the resulting measurements can be significantly noisy.
Non-stationarity: Process dynamics can change significantly, especially across states because of structural changes to the process. Statistical properties of the data such as mean and variance can therefore change significantly between states.
Multiple time-scales: Many processes display multiple time scales with some variables varying quickly (order of seconds) while others respond over hours.
Multi-rate sampling: Different measurements are often sampled at different rates. For instance, online measurements are often sampled frequently (typically seconds) while lab measurements are sampled at a much lower frequency (a few times a day).
Nonlinearity: The data from chemical processes often display significant nonlinearity.
Discontinuity: Discontinuous behaviors occur typically during transitions when variables change status – for instance from inactive to active or no flow to flow.
Run-to-run variations: Multiple instances of the same action or operation carried out by different operators and at different times would not match. So, signals from two instances could be significantly different due to variation in initial conditions, impurity profiles, exogenous environmental or process factors. This could result in deviations in final product quality especially in batch operations (such as in pharmaceutical manufacturing).