Special Offers
- IGI Global’s New Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 17 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no additional cost for multi-user licensing, no embargo of content, full-text PDF & HTML format, and more.
  Learn More
- Open Access Book (Free Access) - Encyclopedia of Information Science and Technology, Sixth Edition (ISBN: 9781668473665)
  The Encyclopedia of Information Science and Technology, Sixth Edition) continues the legacy set forth by the first five editions by providing comprehensive coverage and up-to-date definitions of the most important issues, concepts, and trends pertaining to technological advancements and information management within a variety of settings and industries. The entire book is being published under open access.
  Read Now
- Open Access Book (Free Access) - Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries (ISBN: 9781668456293)
  Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries provides information on the recent technology, mitigation, and environmental protection that must be applied for food sustainability in developing countries. This book is being published under Platinum Open Access through funding from Diponegoro University, Indonesia.
  Read Now
- Open Access Book (Free Access) - New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY (ISBN: 9781668438091)
  The Walmart Corporation and the Lumina Foundation have provided funding to make New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY fully open access, completely removing any paywall between scholars in education and the latest research on new models for the future of higher education.
  Read Now
- Open Access Book (Free Access) - Handbook of Research on the Global View of Open Access and Scholarly Communications (ISBN: 9781799898054)
  Through a collaboration between IGI Global and the University of North Texas, the Handbook of Research on the Global View of Open Access and Scholarly Communications has been published as fully open access, completely removing any paywall between researchers of any field, and the latest research on the equitable and inclusive nature of Open Access and all of its complications.
  Read Now
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Journals by Field
e-Collections
Open Access
- View All Open Access Opportunities
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through IGI Global’s Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open accesss endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global to publish your work under open access? Review IGI Global’s open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us
Newsroom

Hilbert Index-based Outlier Detection Algorithm in Metric Space

Honglong Xu, Haiwu Rong, Rui Mao, Guoliang Chen, Zhiguang Shan

Source Title: International Journal of Grid and High Performance Computing (IJGHPC) 8(4)

DOI: 10.4018/IJGHPC.2016100103

OnDemand:

(Individual Articles)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

Big data is profoundly changing the lifestyles of people around the world in an unprecedented way. Driven by the requirements of applications across many industries, research on big data has been growing. Methods to manage and analyze big data to extract valuable information are the key of big data research. Starting from the variety challenge of big data, this dissertation proposes a universal big data management and analysis framework based on metric space. In this framework, the Hilbert Index-based Outlier Detection (HIOD) algorithm is proposed. HIOD can handle all datatypes that can be abstracted to metric space and achieve higher detection speed. Experimental results indicate that HIOD can effectively overcome the variety challenge of big data and achieves a 2.02 speed up over iORCA on average and, in certain cases, up to 5.57. The distance calculation times are reduced by 47.57% on average and up to 89.10%.

Article Preview

Top

1. Introduction

There is a good saying that firms can be data-rich but information-poor. While this phenomenon can be observed for even common data, it is especially notable for big data. There is a large amount of data produced by, for instance, people’s social activities and various types of equipment, but notably little useful information is obtained from it. If these data cannot be efficiently processed, increasing quantities of data will be accumulated, wasting storage space.

With the development of data mining technology, this problem has been readily solved (Shanshan, Jindian, Pengfei, & Hao, 2016). Data mining technologies, such as clustering, classification, and association analysis, are making it easier for people to obtain the common patterns from the data. However, “one person's noise may be another person's signal” (Kriegel, Kröger, & Zimek, 2010). The uncommon patterns in mass data may have amazing value.

Along with the age of big data, data mining has become much more challenging (Kun-Ming, Sheng-Hui, Li-Wei, & Shu-Hao, 2015). Many industries have set about applying big data technology, in order to mine more valuable information from big data (García-Recuero, Esteves, & Veiga, 2014). However, due to limitations resulting from big data’s complex datatypes, which is also called the variety challenge (Xiaofeng & Xiang, 2013), the industries always suffer from the duplication of building big data analysis systems, which results in a waste of money. The variety of datatypes seriously tests data mining ability (Xuejiao, Xiaofeng, & Yang, 2013).

Outlier detection, which is one of the most important data mining methods, can detect uncommon patterns in mass data (Aggarwal, 2015). The most influential definition is Hawkins’s definition: “An outlier is an observation which deviates so much from the other observations as to arouse suspicions that it was generated by a different mechanism” (Hawkins, 1980). Outlier detection has found an increasingly wide utilization in many fields (Pimentel, Clifton, Clifton, & Tarassenko, 2014), such as credit card fraud detection (Yu & Wang, 2009), public health (Srimani & Koti, 2012), network intrusion detection (Othman, Bakar, Ibrahim, Hassim, & Ain, 2013), etc.

Among various outlier detection methods, distance-based algorithms have excellent universality (Pimentel et al., 2014). A distance-based outlier definition can be used in conjunction with a complete distance-based outlier detection algorithm. In other words, only distance information is used. This approach is known as metric space outlier detection, or MSOD. MSOD has striking advantages in overcoming the variety challenge of big data. Among MSOD, the index-based outlier detection method has a higher detection speed than other detection methods (Bhaduri, Matthews, & Giannella, 2011).

However, for most existing index-based methods, domain-specific information is more or less used. Further, certain index-based methods use a pivot technique but have not provided a pivot selection method and only use a single pivot, leading to spatial warping. In addition, full use has not been made of the distance triangle inequality. Finally, the sparse region of the dataset is ignored, resulting in the slow increase of the outlier degree’s cutoff value.

To solve these problems, we propose a metric space-based big data abstraction framework. Based on this framework, the Hilbert Index Outlier Detection algorithm is proposed together with a pivot selection method. Specifically, we make the following contributions.

1.
Metric space-based big data abstraction framework.
2.
A pivot selection method, which can quickly select pivots from approximate dense regions and takes into consideration the distances between different pivots.
3.
Hilbert-based outlier detection algorithm, which first detects sparse regions such that the cutoff value of outlier degree can be improved as soon as possible.
4.
Three pruning rules are applied in order to reduce the distance calculation times.

Complete Article List

Search this Journal:

Reset

Volume 16: 1 Issue (2024)

Volume 15: 2 Issues (2023)

Volume 14: 6 Issues (2022): 1 Released, 5 Forthcoming

Volume 13: 4 Issues (2021)

Volume 12: 4 Issues (2020)

Volume 11: 4 Issues (2019)

Volume 10: 4 Issues (2018)

Volume 9: 4 Issues (2017)

Volume 8: 4 Issues (2016)

Volume 7: 4 Issues (2015)

Volume 6: 4 Issues (2014)

Volume 5: 4 Issues (2013)

Volume 4: 4 Issues (2012)

Volume 3: 4 Issues (2011)

Volume 2: 4 Issues (2010)

Volume 1: 4 Issues (2009)

View Complete Journal Contents Listing

MLA

APA

Chicago

Export Reference

Hilbert Index-based Outlier Detection Algorithm in Metric Space

Abstract

1. Introduction

Complete Article List