Online Machine Learning

Online Machine Learning

Óscar Fontenla-Romero (University of A Coruña, Spain), Bertha Guijarro-Berdiñas (University of A Coruña, Spain), David Martinez-Rego (University of A Coruña, Spain), Beatriz Pérez-Sánchez (University of A Coruña, Spain) and Diego Peteiro-Barral (University of A Coruña, Spain)
Copyright: © 2013 |Pages: 28
DOI: 10.4018/978-1-4666-3942-3.ch002
OnDemand PDF Download:
$37.50

Abstract

Machine Learning (ML) addresses the problem of adjusting those mathematical models which can accurately predict a characteristic of interest from a given phenomenon. They achieve this by extracting information from regularities contained in a data set. From its beginnings two visions have always coexisted in ML: batch and online learning. The former assumes full access to all data samples in order to adjust the model whilst the latter overcomes this limiting assumption thus expanding the applicability of ML. In this chapter, we review the general framework and methods of online learning since its inception are reviewed and its applicability in current application areas is explored.
Chapter Preview
Top

Introduction

Since the pioneer works in Machine Learning (ML), two visions of tackling the problem of automatic learning have coexisted: batch and online learning. Batch learning paradigm assumes directly or indirectly the following restrictions:

  • The whole training data set can be accessed in order to adjust the model. Each time the learning process needs access to the complete data set, access is immediate and complete.

  • There are no time restrictions. This means that we have enough time to wait until the model is completely adjusted.

  • The process underlying the data generation process does not change. Once the model is adjusted, no further updates are necessary to obtain accurate results.

We discover that if we assume all these restrictions, ML’s applicability narrows significantly. Many significant applications of learning methods during the last 50 years would have been impossible to solve without the relaxation of these restrictions. The following are some examples:

  • Environments where Data Arrives Continuously: In this kind of application, data arrives continuously and an up-to–date model is necessary every time, otherwise the learning process would be redundant. In this case, the access to the whole data can be neither complete (the volume of data grows continuously) nor immediate (one must await its arrival). In fact, it can be argued that the concept of training data set vanishes as it can be assumed that we never have access to it since it is never complete.

  • Massive ML Applications: When the amount of data available makes its centralization impossible (the access cannot be complete) or impractical (due to time restrictions) two approaches can be taken: parallelize the training process while maintaining the batch formulation (see for example the formulation of ML algorithms in terms of Map-Reduce paradigm (Diel & Cauwenberghs, 2003) or follow a different formulation able to optimize the model using reduced data subsets.

  • The Process Underlying the Data Generation Changes: In this case, the initial training set loses its validity as time passes due to changes of conditions in the aimed task. Thus, a mechanism to update a given model in order to adapt to new conditions is necessary.

There is not a clear agreement on the meaning of the term online learning. Some authors use it only for the first of the aforementioned environments, but in other sources the term can also be found referring to the last two environments. The source of controversy possibly stems from the fact that the philosophy of the solutions given to the three environments is similar. In this chapter, it will be assumed that the term online learning applies to any of the aforesaid situations.

Complete Chapter List

Search this Book:
Reset