Article Preview
Top1. Introduction
Sensor network growth has been increasing rapidly in a variety of real-time applications such as information extraction, sensor networks, and data integration (Luo et al., 2014). As a consequence, numerous research has been conducted on the processing of those same uncertain data streams. These uncertain data streams are those that contain insufficient, incorrect, or even deceptive information from a range of sources. Research into obtaining uncertain data has grown in the field of large-scale applications such as computational astrophysics and heavy weather monitoring. The information gathered in this perspective is most likely to be noisy as well as ambiguous (Wang et al., 2019).
Anticipating uncertainty mostly from data input to query output has long been a crucial task in science-based application domains such as RFID networks, radar sensor networks, GPS systems, and camera sensor networks. While these types of uncertain data streams were fed into current stream processing systems that enable monitoring apps, the results were unsatisfying, which is essential for tracking usage applications. Furthermore, due to the size of these data streams, exceptional performance for real-world applications was required, which required sophisticated data processing in an offline mode (Chen & Chen, 2020). Mining evolved as a response to the problem of uncertain quality results, and it has gained prominence in research due to its numerous applications for addressing a wide range of difficulties. Mining has received a lot of interest in the research of uncertain data streams because of its property of evaluating uncertain data using its characteristics (Li et al., 2018). This is due to the vast diversity of data collection strategies that are commonly utilized.
Many datasets were still noisier while mining uncertain data, posing a variety of issues therein dataset's processing, cleansing, as well as mining. Predictions necessitated the employment of a streaming approach which would both evaluate as well as rebuild the data whilst conducting adaptive cleaning on these uncertain streaming data (Wahab et al., 2021). This estimate generates error variances as well as uncertain probability density distributions. Generally, a wide range of online techniques should be presented for the available data as well as to find the missing data, which corresponds to the previously estimated errors. This results in higher error values from the given data in comparable circumstances and should be increased for each estimation. As a result, when processing the presented data, the mining approach must conduct cleansing and estimating to minimize error values (Han et al., 2018).
Data mining algorithms' results were typically influenced by flaws inside the raw data. Any attribute having a strong error rate, for example, is much less dependable during data mining than one having a moderate error rate. Even if distinct attributes, as well as recordings, might have to be handled diversely for mining operations, the overall quality of its basic findings for data mining tasks may suffer the consequences (Liang et al., 2019). While using a flexible model for the uncertainty that assumes a standard error for an unpredicted underlying data element. This also implies that the entire probability distribution function of that same information has been known, resulting in a system that is lower noise resistant. As a limited standard error for uncertainty data could be present in real-time applications, it has been shown that it is inefficient to solve these uncertainty values since it requires online techniques to perform data prediction and collection (Liu et al., 2012).