Data Mining for Visualizing Polluted Gases

Data Mining for Visualizing Polluted Gases

Yas A. Alsultanny
Copyright: © 2023 |Pages: 17
DOI: 10.4018/978-1-7998-9220-5.ch077
(Individual Chapters)
No Current Special Offers


Knowledge discovery from big data is one of the important issues. Big data mining needs many steps, which must be implemented carefully to get accurate results. Visualization is one of the 10 Vs characteristics of big data, and it is the final step in summarizing the results numerically. This article aims to mining the big data recorded by environmental station. These stations are recording the concentrations of pollution gases and meteorological parameters. The 2D and 3D data visualization are used to evaluate the capability of visualization in determining the effect of meteorological parameters on some gases that caused pollution. The results showed the visualization is a very important tool, and visualization can be used in mining big data by simply showing decision makers the pollution gases concentrations graphically. This article recommended using big data visualization periodically as an alarming tool with IoT for monitoring the levels of pollution gases concentration.
Chapter Preview


Big Data Mining (BDM) and Data Visualization (DV) are very important topics in the field of knowledge extraction. Big data required considerable data processing and storage capacity. The big data can be visualized and analyzed to extract knowledge. Big data can be used as a useful tool to enhance decision making (Shumway, 2014). The visual analytical tools have steadily improved during the last years to work with big data. The data age, where data grows exponentially, is a significant struggle to extract knowledge (Zhwan & Zeebaree, 2021). Visual analytics enables the exploration of air quality influence among various traffic scenarios by proper visual means (Bachechi, Po, & Rollo, 2022).

Big data is a term used to describe some of current directions in information technology, as a concept that take into consideration data analysis. The amount of data in the world is huge, in 2020, every person generated 1.7 megabytes per second (Petrov, 2021). It is important to note that most of the big data is unstructured data, where it is not organized and does not fit the usual databases (Smallcombe, 2020).

Data Mining is the technique to get useful knowledge out of databases; data mining requires pre-processing and analytic approach for finding the value. Data mining requires many operations such as data integration, data selection, and so on (Han, Kamber, & Jian, 2012). Selecting a suitable method of data mining is best method for knowledge extraction and forecasting the future (Alsultanny, 2011).

Visual analytic first defined by Thomas and Cook in 2005 as, the science of analytical reasoning facility by interactive visual interface. Murray in 2013 described Data Visualization as; “fortunately, we humans are intensely visual creatures. Few of us can detect patterns among rows of numbers, but even young children can interpret bar charts, extracting meaning from those numbers’ visual representations. Visualizing data is the fastest way to communicate it to others.” Data Visualization are valuable for the introduction of data in graphical form (Thanuj, Vinitha, & Sumathi, 2021).

Air pollution levels raised risk for diseases such as heart disease, stroke, chronic obstructive pulmonary disease, cancer, and pneumonia, the death every year is 4.2 million due to exposure to ambient (outdoor) air pollution (World Health Organization, 2021a). Air pollution is important in our live; most of the pollutants in the air are a result of emissions from cars, trucks, buses, factories, refineries, and other resources.

The objective of this chapter is to highlight the aspects of Big Data miming to visualize air pollution concentrations and it is relative to meteorological parameters. The data for this chapter collected from stations for monitoring pollution gases. These stations usually have an hourly reading to measure concentrations of gases such as ozone (O3), nitrogen dioxide (NO2), sulfur dioxide (SO2), carbon monoxide (CO), carbon dioxide (CO2), particulate matter (PM10 and PM2.5), moreover these stations have an hourly reading for meteorological parameters such as Temperature (Temp), Humidity (Hu), Wind Speed (WS), Wind Direction (WD), and Air Pressure (AP). RapidMiner was used in this chapter to show visually the pollution gases distribution.

Key Terms in this Chapter

Air Pollution: It is the single largest environmental health risk, which is the results of the bad uses of environment resources, which causes harmful effect on humans, animals, plants, and climate change, the major polluted gases are ozone, nitrogen dioxide, carbon monoxide, carbon dioxide, sulfur dioxide, and particulate matter.

Data Mining: Is the process of turn raw data into useful information, by finding relationships between variables, especially in big data.

Data Analytics: Are techniques of analyzing raw data to utilizing it in meaningful methods to extract valuable information insights it and draw conclusions.

Air Quality: Is the degree to which the air is free of harmful substances and must be clean enough for humans, animals, or plants to life healthy.

Climate Change: Is the global phenomenon created by burning fossil fuels, which causes global warming and destroy environment though pollutant gases.

Monitoring Stations: Are stations installed in each country to monitor pollutant gases, working 24/7 to record gases concentration every minute.

Data Visualization: Is a way to represent data graphically, highlighting patterns, outliers, and trends in data, to help the reader to quickly understand relationships between variables, by using charts, graphs, and maps.

Big Data: Is a collection of data from polluted gasses monitoring stations in each country that is huge in volume, globally these data growing exponentially with time, the complexity of these data require especial methods for analysis and management.

Complete Chapter List

Search this Book: