Multiscale Filtering and Applications to Chemical and Biological Systems

Multiscale Filtering and Applications to Chemical and Biological Systems

Mohamed N. Nounou, Hazem N. Nounou, Muddu Madakyaru
DOI: 10.4018/978-1-4666-4450-2.ch025
OnDemand:
(Individual Chapters)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

Measured process data are a valuable source of information about the processes they are collected from. Unfortunately, measurements are usually contaminated with errors that mask the important features in the data and degrade the quality of any related operation. Wavelet-based multiscale filtering is known to provide effective noise-feature separation. Here, the effectiveness of multiscale filtering over conventional low pass filters is illustrated though their application to chemical and biological systems. For biological systems, various online and batch multiscale filtering techniques are used to enhance the quality of metabolic and copy number data. Dynamic metabolic data are usually used to develop genetic regulatory network models that can describe the interactions among different genes inside the cell in order to design intervention techniques to cure/manage certain diseases. Copy number data, however, are usually used in the diagnosis of diseases by determining the locations and extent of variations in DNA sequences. Two case studies are presented, one involving simulated metabolic data and the other using real copy number data. For chemical processes it is shown that multiscale filtering can greatly enhance the prediction accuracy of inferential models, which are commonly used to estimate key process variables that are hard to measure. In this chapter, we present a multiscale inferential modeling technique that integrates the advantages of latent variable regression methods with the advantages of multiscale filtering, and is called Integrated Multiscale Latent Variable Regression (IMSLVR). IMSLVR performance is illustrated via a case study using synthetic data and another using simulated distillation column data.
Chapter Preview
Top

Introduction

With the advancements in computing and sensing technologies, large amounts of data are continuously collected from various engineering systems or processes. These data are a rich source of information about the systems they are collected from. Unfortunately, real data are usually contaminated with errors (or noise) that mask the important features in the data and affect their usefulness in practice. Therefore, measured process data need to be filtered to enhance their quality and usefulness. For example. In biological systems, measured genomic data are used to construct genetic regulatory network models that describe the interactions among different genes within the cells (Jong, 2002; Chou et al., 2006; Gonzalez et al., 2007; Kutalik et al., 2007; Wang et al.,2010; Meskin et al., 2011b). These models are used not only to understand and predict the behavior of the biological system, but also to design intervention techniques that can be ultimately used to manage and cure major phenotypes (Ervadi-Radhakrishnan & Voit, 2005; Meskin et al., 2011a). The presence of measurement noise in the data, however, degrades the accuracy of estimated genetic regulatory network models and the effectiveness of any intervention technique in which these model are used (Kutalik et al., 2007; Wang et al., 2010). Also, Copy Number (CN) data are experimental biological data that are usually used in the diagnosis of diseases by determining the locations and extent of aberrations in DNA sequences. CN data are usually very noisy, which makes it difficult to define the abnormal regions in the DNA (Alqallaf & Tewfik, 2007). Thus, it is important to filter biological data to improve their accuracy and the effectiveness of the applications in which they are used. In chemical processes, on the other hand, measured process data are usually used to develop empirical models, especially when fundamental models are difficult to obtain. An important example is inferential models, which are used to estimate key process variables, which are difficult to measure online from other variables that are easier to measure (Frank & Friedman, 1993; Stone & Brooks, 1990; Kano et al., 2000; Wold, 1982). Unfortunately, the measured data used in estimating empirical models are usually contaminated with errors that degrade the quality of the models and their ability to predict the process behavior (Bakshi, 1999; Palavajjhala et al., 1996; Nounou & Nounou, 2005). Filtering these data will not only enhance the accuracy of estimated models, but also improve any operation (e.g., control, monitoring, etc.) in which these models are used.

Key Terms in this Chapter

Wavelet-Based Multiscale Filtering: Wavelet-based multiscale filtering is a model-free filtering technique that utilizes multiscale representation of data. Multiscale representation is a mathematical representation of a data set as a weighted sum of basis functions called wavelets and scaling functions. The advantage of this multiscale representation is that it separates features occurring at different frequencies. This allows effective removal of noise from important features in the data. Wavelet-based multiscale filtering is a three step procedure: decompose the data at multiple scales, threshold the wavelet coefficients smaller than a threshold value, and finally reconstruct the thresholded coefficients back to the time domain.

Copy Number Data: Copy number (CN) data are one type of genomic data that can be used in the diagnosis of certain diseases that are caused by alterations in the DNA sequences inside the cell of the biological system. A data sample in a CN data set compares a certain DNA sequence (at some location) to a control DNA sequence, which has a known structure. Therefore, CN data can be used to determine the locations and extent of aberrations (deletions or amplifications) in DNA sequences, which can help diagnose or classify certain types of diseases.

Latent Variable Regression: Latent variable regression (LVR) is a framework for dealing with collinearity (or redundancy) when constructing inferential models. LVR relies on transforming the process data so that most of the variations in the data are captured in a small number of variables. Then, the transformed variables (rather than the original data) are used to construct the inferential model. There are several LVR model estimation techniques, and include principal component regression (PCR), partial least squares (PLS), and regularized canonical correlation analysis (RCCA).

Genomic Data: Genomic data represent one type of biological data that quantify the expression levels of certain genes inside the cell. For example, they can physically be the concentrations of certain metabolites or proteins in the cell. Such data are very useful as they provide information about the interactions among different genes inside the biological system. Modeling genomic data can help interpret the behavior of biological systems, and can lead to designing intervention techniques than can be used to cure or manage certain diseases.

Data Filtering: Data filtering is the task of reducing the content of noise or errors from measured process data. It is an important task because measurement noise masks the important features in the data and limits their usefulness in practice. Various techniques have been developed to filter process data, and include model-free techniques, model-based techniques, and techniques based on empirical models.

Distillation Columns: Distillation columns are chemical processing units that are used to separate mixtures into their constituting components based on the volatilities of these components. For examples, distillation columns are widely used in refinery plants to separate crude oil into more useful light components (such as natural gas) and heavy components (such as asphalt). There are various types of distillation columns, and include tray columns and packed columns.

Complete Chapter List

Search this Book:
Reset