Special Offers
- IGI Global’s New Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 17 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no additional cost for multi-user licensing, no embargo of content, full-text PDF & HTML format, and more.
  Learn More
- Open Access Book (Free Access) - Encyclopedia of Information Science and Technology, Sixth Edition (ISBN: 9781668473665)
  The Encyclopedia of Information Science and Technology, Sixth Edition) continues the legacy set forth by the first five editions by providing comprehensive coverage and up-to-date definitions of the most important issues, concepts, and trends pertaining to technological advancements and information management within a variety of settings and industries. The entire book is being published under open access.
  Read Now
- Open Access Book (Free Access) - Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries (ISBN: 9781668456293)
  Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries provides information on the recent technology, mitigation, and environmental protection that must be applied for food sustainability in developing countries. This book is being published under Platinum Open Access through funding from Diponegoro University, Indonesia.
  Read Now
- Open Access Book (Free Access) - New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY (ISBN: 9781668438091)
  The Walmart Corporation and the Lumina Foundation have provided funding to make New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY fully open access, completely removing any paywall between scholars in education and the latest research on new models for the future of higher education.
  Read Now
- Open Access Book (Free Access) - Handbook of Research on the Global View of Open Access and Scholarly Communications (ISBN: 9781799898054)
  Through a collaboration between IGI Global and the University of North Texas, the Handbook of Research on the Global View of Open Access and Scholarly Communications has been published as fully open access, completely removing any paywall between researchers of any field, and the latest research on the equitable and inclusive nature of Open Access and all of its complications.
  Read Now
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Journals by Field
e-Collections
Open Access
- View All Open Access Opportunities
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through IGI Global’s Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open accesss endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global to publish your work under open access? Review IGI Global’s open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us
Newsroom

Particle Swarm Optimizer for High-Dimensional Data Clustering

Yanping Lu, Shaozi Li

Source Title: Kansei Engineering and Soft Computing: Theory and Practice

DOI: 10.4018/978-1-61692-797-4.ch002

OnDemand:

(Individual Chapters)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

This chapter aims at developing effective particle swarm optimization (PSO) for two problems commonly encountered in studies related to high-dimensional data clustering, namely the variable weighting problem in soft projected clustering with known the number of clusters k and the problem of automatically determining the number of clusters k. Each problem is formulated to minimize a nonlinear continuous objective function subjected to bound constraints. Special treatments of encoding schemes and search strategies are also proposed to tailor PSO for these two problems. Experimental results on both synthetic and real high-dimensional data show that these two proposed algorithms greatly improve cluster quality. In addition, the results of the new algorithms are much less dependent on the initial cluster centroids. Experimental results indicate that the promising potential pertaining to PSO applicability to clustering high-dimensional data.

Chapter Preview

Top

1. Introduction

Clustering high-dimensional data is a common but important task in various data mining applications. A fundamental starting point for data mining is the assumption that a data object can be represented as a high-dimensional feature vector. Text clustering is a typical example. In text mining, a text data set is viewed as a matrix, in which a row represents a document and a column represents a unique term. The number of dimensions corresponds to the number of unique terms, which is usually in the hundreds or thousands. Another application where high-dimensional data occurs is insurance company customer prediction. It is important to separate potential customers into groups to help companies predict who would be interested in buying an insurance policy. Many other applications such as bankruptcy prediction, web mining, protein function prediction, etc. present similar data analysis problems.

Clustering high-dimensional data is a difficult task because clusters of high-dimensional data are usually embedded in lower-dimensional subspaces and feature subspaces for different clusters can overlap. In a text data set, documents related to a particular topic are characterized by one subset of terms. For example, a group of documents are categorized under the topic electronics because they contain a subset of terms such as electronics, signal, circuit, etc. The terms describing another topic, athlete, may not occur in the documents on electronics but will occur in the documents relating to sports.

Traditional clustering algorithms struggle with high-dimensional data because the quality of results deteriorates due to the curse of dimensionality. As the number of dimensions increases, data becomes very sparse and distance measures in the whole dimension space become meaningless. Irrelevant dimensions spread out the data points until they are almost equidistant from each other in very high dimensions. The phenomenon is exacerbated when objects are related in different ways in different feature subsets. In fact, some dimensions may be irrelevant or redundant for centain clusters and different sets of dimensions may be relevant for different clusters. Thus, clusters should often be searched for in subspaces of dimensions rather than the whole dimension space.

Clustering of such data sets uses an approach called subspace clustering or projected clustering, aimed at finding clusters from different subspaces. Subspace clustering in general seeks to identify all the subspaces of the dimension space where clusters are most well-separated {see for instance (Goil1, 1999, Woo, 2004)}. The terms subspace clustering and projected clustering are not always used in a consistent way in the literature, but as a general rule, subspace clustering algorithms compute overlapping clusters, whereas projected clustering aims to partition the data set into disjoint clusters {See for instance (Procopiuc, 2002, Achtert, 2008, Moise, 2008)}. Often, projected clustering algorithms search for clusters in subspaces, each of which is spanned by a number of base vectors (main axes). The performance of many subspace/projected clustering algorithms drops quickly with the size of the subspaces in which the clusters are found (Parsons, 2004). Also, many of them require domain knowledge provided by the user to help select and tune their settings, such as the maximum distance between dimensional values (Procopiuc, 2002), the thresholds of input parameters (Moise, 2008) and the minimum density (Agrawal, 2005), which are difficult to establish.

Recently, a number of soft projected clustering algorithms have been developed to identify clusters by assigning an optimal variable weight vector to each cluster (Domeniconi, 2007, Huang, 2005, Jing, 2007). Each of these algorithms iteratively minimizes an objective function. Although the cluster membership of an object is calculated in the whole variable space, the similarity between each pair of objects is based on weighted variable differences. The variable weights transform distance so that the associated cluster is reshaped into a dense hypersphere and can be separated from other clusters. Soft projected clustering algorithms are driven by evaluation criteria and search strategies. Consequently, defining the objective function and efficiently determining the optimal variable weights are the two most important issues in soft projected clustering.

Complete Chapter List

Search this Book:

Reset

MLA

APA

Chicago

Export Reference

Particle Swarm Optimizer for High-Dimensional Data Clustering

Abstract

1. Introduction

Complete Chapter List