Scaling of Streaming Data Using Machine Learning Algorithms

Scaling of Streaming Data Using Machine Learning Algorithms

Önder Aykurt, Zeynep Orman
DOI: 10.4018/978-1-6684-6015-3.ch008
OnDemand:
(Individual Chapters)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

Today, data is generated continuously by millions of data sources, which send in the records simultaneously, in small to large sizes. The rapid growth of data in velocity, volume, value, variety, and veracity has presented big challenges for businesses of all types. This type of data is called streaming data. Streaming data includes a variety of data such as mobile application notifications, e-commerce purchases, sensors in transportation vehicles, information from social applications, IoT sensors. This data is required to be processed sequentially and incrementally on record by record and used for a wide variety of analytics including correlations, filtering, and sampling. Information derived from such analysis gives visibility into many aspects such as customer activity, website clicks, geo-location of devices. There has been a great interest in developing systems for processing continuous data streams. This chapter aims to design a scalable system that can instantly analyze the data using machine learning algorithms.
Chapter Preview
Top

Introduction

The volume of data currently produced by various activities has never been so big and generated at an increasing speed. With the development of technology day by day, its place and importance in our lives are increasing. Developing technology has improved the interaction of many devices with each other and with people. As a result of this interaction, a large amount of data emerges. Real-time generated data is valuable as soon as it arrives and supports decision-making. These data, which are sequential due to their characteristics, obtained in different sizes and irregular periods, are defined as streaming data. Streaming data may lose value or be lost entirely if not processed immediately. Therefore, it is crucial to develop scalable systems that continuously receive and analyze unstructured data. Streaming data is datasets with different properties than static data. In streaming data, the processing time of the algorithm is more critical than the processing time of the static data processing algorithm. Streaming data is valuable as soon as it arrives in the system and needs to be processed and evaluated quickly. E.g., The data coming to the application about the security of financial applications should be evaluated at that moment in terms of transaction security. The data model designed in static data is permanent, and it can update itself according to the data used in streaming data. Since the future data size cannot be predicted, the data model must be adapted to the time-varying data flow. The time required for processing and evaluating streaming data is more limited than static data. The shortening of the evaluation period is critical for the value of the data in the application where streaming data is used.

The setup and continuity of a system that can receive and process streaming data is essential for resource usage. Data analysis hardware is insufficient for good design and easy use. Along with the status of these aircraft, resources are acquired if needed. This system of systems will also be financial (Nittel, 2015). With the data being stored, the references taken from the streaming data model are put in the foreground instead of the applications related to the data records. A small size velocity detail is vital for data in such programs. Volume refers to the unknown of the total size of the data and the size of the entire data. Scanning significant volumes of streaming data in entire storage or time intervals has a negative impact on system performance. The speed will be the probability of being given for a period of time it can process data that can be a job once. Since the streaming data may change over time, the algorithm used if it runs more than once, the model needs to be updated. Accuracy refers to the reliability of the data and whether it needs review. Streaming data is often heterogeneous, and many different types of data are processed together. The concept of diversity provides information about this feature of streaming data.

Fields such as social media applications, e-commerce, mobile applications, the internet of things, operation tracking systems, advertising can be examples as streaming data sources (Kolajo et al., 2019). With the development of e-commerce and web applications, web analysis has gained an increasingly important role. Big data processing tools are used to analyze data such as the number of visitors to the relevant website, the relationship between the products examined, and the user profile. Thus, it was possible to collect and process data in real-time. Physically monitored operations tracking systems are one of the main data sources of streaming data. Basically, metrics that affect the overall performance of discrete computer systems are monitored. A lot of data is processed and recorded, such as the status of disk drives, processor load and performance, network usage, storage unit performance, and access times. Monitoring these systems is important for overall system performance and identifying potential problems. Advertising applications is one of the most critical areas where data is produced and evaluated in real-time. Metrics such as purchases and ad clicks in different environments, together with real-time bidding systems, offer the opportunity to reach the right customer group at the right time. The data produced for this is collected and processed with the metrics determined from the system. Valuable data produced as output is used in new proposals.

Complete Chapter List

Search this Book:
Reset