Article Preview
Top1. Introduction
Nowadays huge amounts of data are generated in various domains. Especially for the construction of Internet of Things in embedded and real-time communication systems (Shukla, A. K., et al. (2018), Kumar, H., & Tyagi, I. (2019)), there is a significant need to extract knowledge from these data with data processing and analyses. Data mining is termed as the practice of analyzing enormous prevailing dataset for the generation of new information, otherwise known as the process of knowledge discovery from the database (Joseph, S. I. T., & Thanakumar, I. (2019)). By now a considerable number of DM algorithms have been developed (Meigal, A. Y., et al. (2019)). The efficiency of these algorithms usage in practice depends on the context (including data characteristics, task requirements, and available resources). In different contexts, different algorithms should be used. The task of selecting DM algorithms for data processing and analysis requires knowledge of DM experts. This leads to unjustified consumption of considerable human resources and to time delays.
To formalize data processing and analysis using DM algorithms a number of standards for constructing data mining processes have been developed. Today, to build DM processes there exist three main standards CRISP-DM (Chapman, P., et al. (2000)), SEMMA (Matignon, Randall. (2007)), and KDD ((Fayyad et al. (1996)). According to these standards, the DM processes consist of several stages and hundreds of activities. The stages include data preparation, modeling, evaluation. Each of them requires the choice of the operators/algorithms, thus, to get the effective solutions, we must spend many efforts and it will take a long time.
To address this issue, a number of systems that support DM processes were proposed that can be used for DM workflow generation:
- •
RapidMiner (Hofmann, M., & Klinkenberg, R. (Eds.). (2016))
- •
OpenML (Vanschoren, J., et al. (2014))
- •
Google: Cloud AutoML (Bisong, E. (2019)), Google’s Prediction API (Ujwal, U. J., et al. (2018))
- •
Microsoft: Custom Vision (Salvaris, M., et al. (2018))
- •
Amazon: Amazon Machine Learning (Herbrich, R. (2017))
- •
Others: BigML.com, Wise.io, SkyTree.com, Dato.com, Prediction.io, DataRobot.com
These systems are implemented based on the following techniques:
- •
Meta learning (Vilalta, Ricardo, et al. (2004)), that is learning to learn, is defined as the application of machine learning (ML) techniques to meta-data about past machine learning experiments with the goal of modifying some aspects of the learning process in order to improve the performance of the resulting model.
- •
AutoML (He, Xin, et al. (2021)) is the process of automating the tasks of applying ML to real-world problems. AutoML consists of meta learning and hyperparameter optimization.
The architectures based on Meta learning and AutoML focus on data preparation and modeling stages of DM processes, rather than on building a workflow for the entire DM process.