Hershey, Pennsylvania

New York, New YorkBeijing, China

Special Offers
- Up to 50% off Thousands of Research Books
  From July 1st through October 31st, 2025, we are offering discounts of up to 50% across thousands of titles in Business & Management; Science, Technology, & Medicine; and Education & Social Sciences. Through this campaign, we’re committed to ensuring that our mutual library customers worldwide can continue to access high-quality, peer-reviewed content during these challenging times. If this campaign is successful, we will extend through the end of the year and beyond if there’s a benefit to all parties involved. When hosted on the InfoSci^® Platform, e-books feature no DRM, no additional cost for unlimited-user licensing, full-text PDF & HTML formats, and more. Discount is automatically added at checkout.
  Browse Titles
- IGI Global Scientific Publishing Launches International Brand Ambassador Program
  IGI Global Scientific Publishing has launched a new Ambassador Program, designed to empower research professionals to help spread scholarly resources and foster global research engagement. As a local, mid-sized publisher, this initiative offers IGI Global Scientific Publishing an exciting opportunity to expand its global presence in the academic community and foster meaningful connections among scholars around the world. With currently over 130 ambassadors worldwide, these scholarly experts are dedicated to supporting the publisher’s initiative of disseminating cutting-edge research.
  Learn More
- Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 20 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no hosting or maintenance fees, no additional cost for unlimited-user licensing, full-text PDF & HTML format, and more.
  Learn More
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education & Social Sciences
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education & Social Sciences
  - Journals by Field
e-Collections
OnDemand
Open Access
- View All Open Access Opportunities
  Search across all available IGI Global Scientific Publishing open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all available IGI Global Scientific Publishing open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through the IGI Global Scientific Publishing Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global Scientific Publishing to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open access endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global Scientific Publishing to publish your work under open access? Review the IGI Global Scientific Publishing open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us

An Optimized Semi-Supervised Learning Approach for High Dimensional Datasets

Nesma Settouti (Tlemcen University, Algeria), Mostafa El Habib Daho (Tlemcen University, Algeria), Mohammed El Amine Bechar (Tlemcen University, Algeria), and Mohammed Amine Chikh (Tlemcen University, Algeria)

Source Title: Applying Big Data Analytics in Bioinformatics and Medicine

DOI: 10.4018/978-1-5225-2607-0.ch012

OnDemand:

(Individual Chapters)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

The semi-supervised learning is one of the most interesting fields for research developments in the machine learning domain beyond the scope of supervised learning from data. Medical diagnostic process works mostly in supervised mode, but in reality, we are in the presence of a large amount of unlabeled samples and a small set of labeled examples characterized by thousands of features. This problem is known under the term “the curse of dimensionality”. In this study, we propose, as solution, a new approach in semi-supervised learning that we would call Optim Co-forest. The Optim Co-forest algorithm combines the re-sampling data approach (Bagging Breiman, 1996) with two selection strategies. The first one involves selecting random subset of parameters to construct the ensemble of classifiers following the principle of Co-forest (Li & Zhou, 2007). The second strategy is an extension of the importance measure of Random Forest (RF; Breiman, 2001). Experiments on high dimensional datasets confirm the power of the adopted selection strategies in the scalability of our method.

Chapter Preview

Top

Introduction

One of the strongest problems afflicting current machine learning techniques is dataset dimensionality. Nowadays, with the advance of technologies, in many applications of real world problems, we deal with data from a few dozen to many thousands of dimensions. The analysis of higher dimensional datasets is difficult, not only because they are large in terms of the number of observations, but also because of the large number of variables (features) that can be generated with the modern automatic acquisition methods. In fact, most applications allow to obtain many features and samples at low cost. However, the relevant features are often more difficult to be obtain than the others. This is particularly true in the prediction problems.

In these application fields, the learning task is confronted with another important detail where, new samples are easily generated; nevertheless, labeling data can be costly and time consuming. For example, with the fast development of the Internet, it is easy to get billions of Web pages from Web servers. However, the classification of web pages into classes is a long and difficult task. Also in the field of speech recognition, registration gives a huge amount of audio data whose cost is negligible. However, labeling them requires someone to listen and understand later. Similar situations apply to remote sensing, face recognition, medical imaging, image search by content (Zhou and Goldman, 2004) and intrusion detection in computer networks (Roli, 2005).

The availability of unlabeled data and the difficulty of obtaining labels, make the semi-supervised learning methods gained great importance. The question that arises is whether the knowledge of points with labels is sufficient to construct a decision function that can correctly predict the labels of unlabeled points. Different approaches propose to deduct unlabeled points of additional information and include them in the learning problem.

Different kinds of approaches have been developed to achieve the semi-supervised learning task. There are mainly three paradigms (Chapelle, O. et al., 2006; Cornuéjols and Miclet, 2010) that address the problem of combination of labeled and unlabeled to improve the performances. Therefore, we include in brief these categories:

•
Semi-Supervised Learning (SSL): Refers to methods that attempt to exploit unlabeled data for supervised learning where unlabeled examples are different from test examples; or exploiting labeled data for unsupervised learning.
•
The Transductive Learning: Assemble methods that attempt also to exploit the unlabeled examples, but assuming unlabeled examples are exactly the test examples.
•
The Active Learning: Refers to methods that select unlabeled examples that are the most important, and an oracle can be proposed for the labeling of these instances; the objective is to minimize the labeling data (Freund, Y. et al., 1997). Sometimes it is called selective sampling or sample selection.

In this paper, we focus on improving the performance of supervised classification using unlabeled data (SSL). In this context the two main contributions of this work are the treatment of the following questions: “How to judge the relevance of a model using unlabeled data? ” And ”How to improve the performance of the model ?”.

Complete Chapter List

Search this Book:

Reset

MLA

APA

Chicago

Export Reference

An Optimized Semi-Supervised Learning Approach for High Dimensional Datasets

Abstract

Introduction

Complete Chapter List