Visualization Tools for Big Data Analytics in Quantitative Chemical Analysis: A Tutorial in Chemometrics

Visualization Tools for Big Data Analytics in Quantitative Chemical Analysis: A Tutorial in Chemometrics

Gerard G. Dumancas (Louisiana State University – Alexandria, USA), Ghalib A. Bello (Icahn School of Medicine at Mount Sinai, USA), Jeff Hughes (RMIT University, Australia), Renita Murimi (Oklahoma Baptist University, USA), Lakshmi Chockalingam Kasi Viswanath (Oklahoma Baptist University, USA), Casey O'Neal Orndorff (Louisiana State University – Alexandria, USA), Glenda Fe Dumancas (Louisiana State University – Alexandria, USA) and Jacy D. O'Dell (Oklahoma Baptist University, USA)
DOI: 10.4018/978-1-5225-3142-5.ch030

Abstract

Modern instruments have the capacity to generate and store enormous volumes of data and the challenges involved in processing, analyzing and visualizing this data are well recognized. The field of Chemometrics (a subspecialty of Analytical Chemistry) grew out of efforts to develop a toolbox of statistical and computer applications for data processing and analysis. This chapter will discuss key concepts of Big Data Analytics within the context of Analytical Chemistry. The chapter will devote particular emphasis on preprocessing techniques, statistical and Machine Learning methodology for data mining and analysis, tools for big data visualization and state-of-the-art applications for data storage. Various statistical techniques used for the analysis of Big Data in Chemometrics are introduced. This chapter also gives an overview of computational tools for Big Data Analytics for Analytical Chemistry. The chapter concludes with the discussion of latest platforms and programming tools for Big Data storage like Hadoop, Apache Hive, Spark, Google Bigtable, and more.
Chapter Preview
Top

Applications Of Chemometrics

Chemometrics is a fast spreading area which has many avenues of applications in both descriptive and predictive problems in experimental life sciences especially in Chemistry. It is considered to be a highly interfacial discipline employing Multivariate Statistics, Computer Science and Applied Mathematics using methods employed in core data analytics with the ultimate goal of addressing problems in Biochemistry, Medicine, Chemistry, Chemical Engineering and Biology (Khanmohammadi, 2014).

The biological and medical applications of Chemometrics encompass a wide area of expertise. Support Vector Machines (SVMs), Partial Least Squares Discriminant Analysis (PLS-DA) are widely used techniques for classification purposes involving microorganisms, medical diagnosis using spectroscopy and metabolomics using Coupled Chromatography and Nuclear Magnetic Resonance Spectrometry (Brereton, 2007).

Other widely used applications of Chemometrics is in food science. Specifically near Infrared Spectroscopy (Near IR) is used for calibration, classification and exploratory purposes. The ultimate goal is for sensory analysis which links composition to products using sensory panels and PCA. (Brereton, 2007)

Key Terms in this Chapter

Process Analytical Technology: Has been defined by the United States Food and Drug Administration as a mechanism to design, analyze and control pharmaceutical manufacturing processes through the measurement of Critical Process Parameters which affect Critical Quality Attributes.

Hierarchical Cluster Analysis: Is a method of cluster analysis which seeks to build a hierarchy of clusters.

Chemometrics: A branch of Analytical Chemistry that deals with the utilization of multivariate statistical techniques to come up with meaningful information about the data.

Continuous Wavelet Transform: Uses inner products to measure the similarity between a signal and an analyzing function.

Big Data Approach: An approach that involves managing Big Data from different sources or databases.

Discrete Wavelet Transform: Is an implementation of the Wavelet Transform using a discrete set of the wavelet scales and translations obeying some defined rules.

Factor Analysis: A process in which the values of observed data are expressed as functions of a number of possible causes in order to find which are the most important.

Principal Component Analysis: Is a statistical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of linearly uncorrelated variables called principal components.

Principal Component Regression: Constructs new predictor variables, known as components, as linear combinations of the original predictor variables by creating components to explain the observed variability in the predictor variables, without considering the response variable at all.

Generalized Linear Model: Is a flexible generalization of ordinary Linear Regression that allows for response variables that have error distribution models other than a normal distribution.

Complete Chapter List

Search this Book:
Reset