Principal Component Analysis of Hydrological Data

Principal Component Analysis of Hydrological Data

Petr Praus (VSB - Technical University of Ostrava, Czech Republic)
DOI: 10.4018/978-1-61520-907-1.ch018
OnDemand PDF Download:
No Current Special Offers


In this chapter the principals and applications of principal component analysis (PCA) applied on hydrological data are presented. Four case studies showed the possibility of PCA to obtain information about wastewater treatment process, drinking water quality in a city network and to find similarities in the data sets of ground water quality results and water-related images. In the first case study, the composition of raw and cleaned wastewater was characterised and its temporal changes were displayed. In the second case study, drinking water samples were divided into clusters in consistency with their sampling localities. In the case study III, the similar samples of ground water were recognised by the calculation of cosine similarity, the Euclidean and Manhattan distances. In the case study IV, 32 water-related images were transformed into a large image matrix whose dimensionality was reduced by PCA. The images were clustered using the PCA scatter plots.
Chapter Preview

Motto: Variation is information



Principal component analysis is a basic multivariate statistical method. The method was firstly introduced by Karl Pearson (1901) and subsequently developed by Hotelling (1933 a,b). Until the 1950s, the method had limited applications due to the lack of computational equipment.

The main objective of PCA is looking for new latent (hidden) variables of n samples, which are not correlated to each other. Each latent variable ti (principal component) is a linear combination of p variables and describes a different source of total variationt1 = w1,1x1,1 + w1,2 x1,2 + …+ w1,px1,pt2 = w2,1x2,1 + w2,2 x2,2 + …+ w2,px2,p(1) tn = wn,1xn,1 + wn,2 xn,2 + …+ wn,pxn,pwhere wi,j and xi,j (1< i ≤ n, 1< j ≤ p) are component weight (loading) and original variable (parameter), respectively. The component loadings are the contribution measures of a particular variable to the principal components. It also holds

w2i,1 + w2i,2 + ...+ w2i,p = 1 (2)

Key Terms in this Chapter

Clustering: A partition of observations into groups called clusters whose members have similar properties.

Screen Plot: A graph of eigenvalues or singular values that demonstrates the portion of total variance represented by the principal components.

Singular Value Decomposition: A method of linear algebra for the decomposition of m x n data matrix into three matrices: m x r matrix, r x r matrix of singular values, and r x n matrix.

Scatter Plot: A two or three dimensional co-ordinate system showing observations as points characterised by the principal components.

Principal Component: A linear combination of variables, which describes a different variation in observations.

Water Quality: A complex of chemical, physical, microbiological, and biological parameters defining the composition of water for the specific utilization, for example, drinking.

Component Loading (Loading): A contribution of the particular variable to the principal components.

Complete Chapter List

Search this Book: