Classification Algorithms and Control-Flow Implementation

Classification Algorithms and Control-Flow Implementation

DOI: 10.4018/978-1-7998-8350-0.ch002
OnDemand:
(Individual Chapters)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

Supervised classification algorithms exploit many features that are tightly related to control-flow architecture. This reduces the possibility of applying these algorithms to dataflow architecture. This chapter makes an overview of some features characteristic to various classification algorithms that cannot be implemented on dataflow architecture. The chapter provides examples of applying various classification algorithms to three datasets with different types of material.
Chapter Preview
Top

Introduction

Supervised classification (in further text only classification) is the most commonly used method among all the methods developed in the area of data mining and machine learning, with a large number of developed classification algorithms used to solve problems in various fields. The emergence of new areas in which the application of data classification is required relatively quickly leads to the development of new algorithms suitable for application, as it is the case with big data or data streaming, for example. All these algorithms are based on the control-flow paradigm, with the idea that the implementation is performed on computers with von Neumann architecture.

The development of computers with dataflow architecture began thirty years after the emergence of computers with control-flow architecture. Due to low use (as the space for application had already been occupied by control-flow computers, especially for commercial purposes), lower intensity of development, and significant and well-developed competition (control-flow computers), a relatively small amount of software has been developed for dataflow computers as a target platform. This is especially visible in various (mathematically based) algorithms, even in algorithms related to the field of classification. Although some of the ideas on which the currently developed classification algorithms are based can be applied to the dataflow paradigm with minor modifications, there are a number of features and characteristics of algorithms that are essential and directly related to computer architecture and corresponding software support. If the development of dataflow computers continues at the same rate with the momentum it gained in recent years, it is possible to foresee a growth in the number of algorithms which will with minor changes be suitable for mapping to the dataflow architecture. Currently, the dataflow software support that would allow solving various problems is poor compared to the software developed for the control-flow paradigm.

A large number of applications require classification but for various reasons, even to a certain extent, cannot be implemented on a purely dataflow architecture (an architecture that does not include control-flow components such as the control-flow CPU). The reasons for the impossibility of simple implementation of algorithms can be divided into several groups. The first group is related to data types that cannot be represented on dataflow computers. The need for these types of data arises during the classification of:

  • text data (text classification as part of text mining);

  • multimedia data (images, sound, hybrid material, etc.);

  • data based on specific data properties (e.g. working with time series);

  • materials contained in databases (e.g. relational databases);

  • categorical and discrete data;

  • non-existent data, which causes problems during work.

The second group is related to the methods on which algorithms or their parts are based. This group includes:

  • Preprocessing and preparation of input material. Preprocessing can involve different types of dimensional reductions, sampling, class balancing, normalization, discretization, and so on, as well as work with multidimensional data of different types and structures.

  • The need for complex mathematical calculations. Such calculations are often performed via built-in functions from software libraries (e.g., various statistical calculations or activation functions in neural network nodes).

  • The need for classification in a distributed environment.

  • Use of methods that have different execution requirements, such as classification through association rules.

  • Variation of different algorithm parameters and hyperparameter tuning.

  • Visual classification of materials.

Key Terms in this Chapter

KNN: Algorithm for classification entitled K nearest neighbors.

Control-flow: Convectional programming paradigm for general purpose.

Complete Chapter List

Search this Book:
Reset