Special Offers
- IGI Global’s New Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 17 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no additional cost for multi-user licensing, no embargo of content, full-text PDF & HTML format, and more.
  Learn More
- Open Access Book (Free Access) - Encyclopedia of Information Science and Technology, Sixth Edition (ISBN: 9781668473665)
  The Encyclopedia of Information Science and Technology, Sixth Edition) continues the legacy set forth by the first five editions by providing comprehensive coverage and up-to-date definitions of the most important issues, concepts, and trends pertaining to technological advancements and information management within a variety of settings and industries. The entire book is being published under open access.
  Read Now
- Open Access Book (Free Access) - Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries (ISBN: 9781668456293)
  Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries provides information on the recent technology, mitigation, and environmental protection that must be applied for food sustainability in developing countries. This book is being published under Platinum Open Access through funding from Diponegoro University, Indonesia.
  Read Now
- Open Access Book (Free Access) - New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY (ISBN: 9781668438091)
  The Walmart Corporation and the Lumina Foundation have provided funding to make New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY fully open access, completely removing any paywall between scholars in education and the latest research on new models for the future of higher education.
  Read Now
- Open Access Book (Free Access) - Handbook of Research on the Global View of Open Access and Scholarly Communications (ISBN: 9781799898054)
  Through a collaboration between IGI Global and the University of North Texas, the Handbook of Research on the Global View of Open Access and Scholarly Communications has been published as fully open access, completely removing any paywall between researchers of any field, and the latest research on the equitable and inclusive nature of Open Access and all of its complications.
  Read Now
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Journals by Field
e-Collections
OnDemand
Open Access
- View All Open Access Opportunities
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through IGI Global’s Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open accesss endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global to publish your work under open access? Review IGI Global’s open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us

Classification Trees as Proxies

Anthony Scime, Nilay Saiya, Gregg R. Murray, Steven J. Jurek

Source Title: International Journal of Business Analytics (IJBAN) 2(2)

DOI: 10.4018/IJBAN.2015040103

OnDemand:

(Individual Articles)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

In data analysis, when data are unattainable, it is common to select a closely related attribute as a proxy. But sometimes substitution of one attribute for another is not sufficient to satisfy the needs of the analysis. In these cases, a classification model based on one dataset can be investigated as a possible proxy for another closely related domain's dataset. If the model's structure is sufficient to classify data from the related domain, the model can be used as a proxy tree. Such a proxy tree also provides an alternative characterization of the related domain. Just as important, if the original model does not successfully classify the related domain data the domains are not as closely related as believed. This paper presents a methodology for evaluating datasets as proxies along with three cases that demonstrate the methodology and the three types of results.

Article Preview

Top

1. Introduction

One of the goals of data analysis is to find factors that characterize events and situations in the world. By understanding a current situation, policy makers, decision makers, and researchers can take actions directed toward changing society and individual lives for the better.

Sometimes decision makers cannot obtain all the data necessary to make a sound decision. For instance, a particular data point may be too costly, or it may be unobservable. In these cases, a proxy variable or attribute may be used. A proxy attribute is an attribute used in place of the unacquirable data. While not a direct measure of the desired data point, a good proxy attribute should be strongly related to the unobserved attribute of interest (Clinton, 2004). Proxies can introduce error in measuring the outcome (Kimball, Sahm, & Shapiro, 2008) but are necessary because the desired value is needed although unattainable. Sometimes multiple proxies exist and using a combination of these proxies can reduce proxy-introduced error (Lubotsky & Wittenberg, 2006; Trickett, Persky, & Espino, 2009). The extent of error is difficult or impossible to measure because the baseline, unattainable attribute is, well, unattainable.

At other times decision makers may want to assess the similarities between datasets. Here the decision maker hopes to make the decision using one dataset as a proxy for another dataset. Braslow and Humez (2014) and Hargittai (2005) investigated using survey data as a proxy for observation data. Observation data are more difficult and expensive to collect. Saunders, Bex, and Woods (2013) investigated the use of crowdsourcing data as a proxy for lab collected data in the medical domain. Crowdsourcing is well established in medical research for assembling large normative datasets. Of course, using one dataset as a proxy for another can also result in errors.

The ability to compare datasets could conceivably have great utility and real-world ramifications. One might want to know, for example, if gender or ethnic differences mattered in terms of election outcomes in one year but not in another, or if systemic differences like the institutional structure of a regime could correspond with the level of freedom in a state. Those studying the causes of war might be interested in whether or not the causes of civil and international war are similar and compare datasets on each to analyze the question. Comparing datasets on the causes of religious and secular terrorism would indicate if the determinants of both kinds of terrorism are the same. Analyzing a question in this way and showing differences between similar domains can carry ramifications for policy makers and researchers seeking to address the root causes of particular types of problems.

When a proxy attribute is used, regression analysis and other statistical techniques can test hypotheses to evaluate data against an expected outcome. These techniques inform a researcher about relationships between independent attributes including the proxy attributes and the dependent attribute or class attribute. This analysis can be used to study how an attribute influences an outcome while accounting for the other attributes that also influence the outcome. However, when attempting to substitute one dataset for another, regression and other statistical techniques may not provide sufficient information. Other techniques can be more insightful and practical than regression when predicting the interaction of attributes on the dependent attribute or class attribute (Andoh-Baidoo & Osei-Bryson, 2007; Chang, 2006). Classification can be used as an analysis technique when proxy attributes are used and the classification tree itself may act as a proxy tree for a similar domain.

This paper offers a methodology for evaluating the use of a dataset’s classification model as a proxy model for a similar dataset; it then presents three cases that demonstrate the methodology and the three types of results. In this endeavor, the next sections describe classification analysis and its use as a mechanism for identifying proxy models. Then, it presents the three case studies regarding executive leadership, voter turnout, and terrorism followed by a conclusion.

Complete Article List

Search this Journal:

Reset

Volume 11: 1 Issue (2024)

Volume 10: 1 Issue (2023)

Volume 9: 6 Issues (2022): 4 Released, 2 Forthcoming

Volume 8: 4 Issues (2021)

Volume 7: 4 Issues (2020)

Volume 6: 4 Issues (2019)

Volume 5: 4 Issues (2018)

Volume 4: 4 Issues (2017)

Volume 3: 4 Issues (2016)

Volume 2: 4 Issues (2015)

Volume 1: 4 Issues (2014)

View Complete Journal Contents Listing

MLA

APA

Chicago

Export Reference

Classification Trees as Proxies

Abstract

1. Introduction

Complete Article List