Special Offers
- IGI Global’s New Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 17 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no additional cost for multi-user licensing, no embargo of content, full-text PDF & HTML format, and more.
  Learn More
- Open Access Book (Free Access) - Encyclopedia of Information Science and Technology, Sixth Edition (ISBN: 9781668473665)
  The Encyclopedia of Information Science and Technology, Sixth Edition) continues the legacy set forth by the first five editions by providing comprehensive coverage and up-to-date definitions of the most important issues, concepts, and trends pertaining to technological advancements and information management within a variety of settings and industries. The entire book is being published under open access.
  Read Now
- Open Access Book (Free Access) - Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries (ISBN: 9781668456293)
  Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries provides information on the recent technology, mitigation, and environmental protection that must be applied for food sustainability in developing countries. This book is being published under Platinum Open Access through funding from Diponegoro University, Indonesia.
  Read Now
- Open Access Book (Free Access) - New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY (ISBN: 9781668438091)
  The Walmart Corporation and the Lumina Foundation have provided funding to make New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY fully open access, completely removing any paywall between scholars in education and the latest research on new models for the future of higher education.
  Read Now
- Open Access Book (Free Access) - Handbook of Research on the Global View of Open Access and Scholarly Communications (ISBN: 9781799898054)
  Through a collaboration between IGI Global and the University of North Texas, the Handbook of Research on the Global View of Open Access and Scholarly Communications has been published as fully open access, completely removing any paywall between researchers of any field, and the latest research on the equitable and inclusive nature of Open Access and all of its complications.
  Read Now
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Journals by Field
e-Collections
OnDemand
Open Access
- View All Open Access Opportunities
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through IGI Global’s Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open accesss endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global to publish your work under open access? Review IGI Global’s open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us

Selection of the Best Subset of Variables in Regression and Time Series Models

Nicholas A. Nechval, Konstantin N. Nechval, Maris Purgailis, Uldis Rozevskis

Source Title: Cybernetics and Systems Theory in Management: Tools, Views, and Advancements

DOI: 10.4018/978-1-61520-668-1.ch016

OnDemand:

(Individual Chapters)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

The problem of variable selection is one of the most pervasive model selection problems in statistical applications. Often referred to as the problem of subset selection, it arises when one wants to model the relationship between a variable of interest and a subset of potential explanatory variables or predictors, but there is uncertainty about which subset to use. Several papers have dealt with various aspects of the problem but it appears that the typical regression user has not benefited appreciably. One reason for the lack of resolution of the problem is the fact that it is has not been well defined. Indeed, it is apparent that there is not a single problem, but rather several problems for which different answers might be appropriate. The intent of this chapter is not to give specific answers but merely to present a new simple multiplicative variable selection criterion based on the parametrically penalized residual sum of squares to address the subset selection problem in multiple linear regression analysis, where the objective is to select a minimal subset of predictor variables without sacrificing any explanatory power. The variables, which optimize this criterion, are chosen to be the best variables. The authors find that the proposed criterion performs consistently well across a wide variety of variable selection problems. Practical utility of this criterion is demonstrated by numerical examples.

Chapter Preview

Top

Introduction

Variable selection refers to the problem of selecting input variables that are most predictive of a given outcome. Variable selection problems are found in all supervised or unsupervised machine learning tasks, classification, regression, time series prediction, pattern recognition.

In the recent years, variable selection has become the focus of considerable research in several areas of application for which datasets with tens or hundreds of thousands of variables are available. These areas include text processing, particularly in application to Internet documents, and genomics, particularly gene expression array data. The objective of variable selection is three-fold: to improve the prediction performance of the predictors, to provide faster and more cost-effective predictors, and to provide a better understanding of the underlying process that generated the data.

A number of studies in the statistical literature discuss the problem of selecting the best subset of predictor variables in regression. Such studies focus on subset selection methodologies, selection criteria, or a combination of both. The traditional selection methodologies can be enumerative (e.g. all subsets and best subsets procedures), sequential (e.g. forward selection, backward elimination, stepwise regression, and stagewise regression procedures), and screening-based (e.g. ridge regression and principal components analysis). Standard texts like Draper and Smith (1981) and Montgomery and Peck (1992) provide clear descriptions of these methodologies.

Some of the reasons for using only a subset of the available predictor variables (given by Miller, 2002) are:

• To estimate or predict at a lower cost by reducing the number of variables on which data are to be collected;
• To predict more accurately by eliminating uninformative variables;
• To describe multivariate data sets parsimoniously; and
• To estimate regression coefficients with smaller standard errors (particularly when some of the predictors are highly correlated).

These objectives are of course not completely compatible. Prediction is probably the most common objective, and here the range of values of the predictor variables for which predictions will be required is important. The subset of variables giving the best predictions in some sense, averaged over the region covered by the calibration data, may be very inferior to other subsets for extrapolation beyond this region. For prediction purposes, the regression coefficients are not the primary objective, and poorly estimated coefficients can sometimes yield acceptable predictions. On the other hand, if process control is the objective then it is of vital importance to know accurately how much change can be expected when one of the predictors changes or is changed.

Suppose that y, a variable of interest, and x₁, ..., x_v, a set of potential explanatory variables or predictors, are vectors of n observations. The problem of variable selection, or subset selection as it is often called, arises when one wants to model the relationship between y and a subset of x₁, ..., x_v, but there is uncertainty about which subset to use. Such a situation is particularly of interest when v is large and x₁, ..., x_v is thought to contain many redundant or irrelevant variables.

The variable selection problem is most familiar in the linear regression context, where attention is restricted to normal linear models. Letting w index the subsets of x₁, ..., x_v and letting p_w be the number of the parameters of the model based on the wth subset, the problem is to select and fit a model of the formy = X_wa_w + ε, (1) where X_w is an n × p_w matrix whose columns correspond to the wth subset, a_w is a p_w × 1 vector of regression coefficients, and ε~N_n(0,σ²I). More generally, the variable selection problem is a special case of the model selection problem where each model under consideration corresponds to a distinct subset of x₁, ..., x_v. Typically, a single model class is simply applied to all possible subsets.

Complete Chapter List

Search this Book:

Reset

MLA

APA

Chicago

Export Reference

Selection of the Best Subset of Variables in Regression and Time Series Models

Abstract

Introduction

Complete Chapter List