Special Offers
- IGI Global’s New Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 17 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no additional cost for multi-user licensing, no embargo of content, full-text PDF & HTML format, and more.
  Learn More
- Open Access Book (Free Access) - Encyclopedia of Information Science and Technology, Sixth Edition (ISBN: 9781668473665)
  The Encyclopedia of Information Science and Technology, Sixth Edition) continues the legacy set forth by the first five editions by providing comprehensive coverage and up-to-date definitions of the most important issues, concepts, and trends pertaining to technological advancements and information management within a variety of settings and industries. The entire book is being published under open access.
  Read Now
- Open Access Book (Free Access) - Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries (ISBN: 9781668456293)
  Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries provides information on the recent technology, mitigation, and environmental protection that must be applied for food sustainability in developing countries. This book is being published under Platinum Open Access through funding from Diponegoro University, Indonesia.
  Read Now
- Open Access Book (Free Access) - New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY (ISBN: 9781668438091)
  The Walmart Corporation and the Lumina Foundation have provided funding to make New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY fully open access, completely removing any paywall between scholars in education and the latest research on new models for the future of higher education.
  Read Now
- Open Access Book (Free Access) - Handbook of Research on the Global View of Open Access and Scholarly Communications (ISBN: 9781799898054)
  Through a collaboration between IGI Global and the University of North Texas, the Handbook of Research on the Global View of Open Access and Scholarly Communications has been published as fully open access, completely removing any paywall between researchers of any field, and the latest research on the equitable and inclusive nature of Open Access and all of its complications.
  Read Now
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Journals by Field
e-Collections
Open Access
- View All Open Access Opportunities
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through IGI Global’s Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open accesss endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global to publish your work under open access? Review IGI Global’s open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us
Newsroom

Parallel Outlier Detection for Streamed Data Using Non-Parameterized Approach

Harshad Dattatray Markad, S. M. Sangve

Source Title: International Journal of Synthetic Emotions (IJSE) 8(2)

DOI: 10.4018/IJSE.2017070102

OnDemand:

(Individual Articles)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

Outlier detection is used in various applications like detection of fraud, network analysis, monitoring traffic over networks, manufacturing and environmental software. The data streams which are generated are continuous and changing over time. This is the reason why it becomes nearly difficult to detect the outliers in the existing data which is huge and continuous in nature. The streamed data is real time and changes over time and hence it is impractical to store data in the data space and analyze it for abnormal behavior. The limitations in data space has led to the problem of real time analysis of data and processing it in FCFS basis. The results regarding the abnormal behavior have to be done very quickly and in a limited time frame and on an infinite set of data streams coming over the networks. To address the problem of detecting outliers on a real-time basis is a challenging task and hence has to be monitored with the help of the processing power used to design the graphics of any processing unit. The algorithm used in this paper uses a kernel function to accomplish the task. It produces timely outcome on high speed multi- dimensional data. This method increases the speed of outlier detection by 20 times and the speed goes on increasing with the increase with the number of data attributes and input data rate.

Article Preview

Top

Introduction

Outlier detection mechanism is the detecting of a different pattern or an unusual pattern that is different from the rest of the normal data set. Outlier detection is usually done to indicate and identify the defective data or a behavior. Sometimes outlier detection is done to analyses the data for security and scientific interest. Instead of discarding the data, researchers sometimes compose the data pattern in the form of data mining technique so that same pattern can be detected very easily in the near future.

The data incoming to the system is in the form of chunks or streams. The data in the form of chunks are usually datagram. The datagram is similar in size and is possible to detect and identify the outliers in it very easily. When the data is the form of streams, it becomes difficult to analyze the data on the regular basis as the stream is continuous and enormous data comes into the system. As the streamed data is continuous is nature, it becomes nearly impractical to store it in a memory and analyze it thoroughly for abnormal behavior. This gave rise to the use of stream outlier detection mechanism which works on one pass basis. The streamed data is put into the tunnel of data outlier detection mechanism such that the outlier is detected in a single pass which overcome the need to store the data in the memory and analyze it fragment by fragment.

Although the density-based outlier detection approaches are proven to be accurate, they are also known to be computationally demanding. Therefore, when using kernel density estimation to detect outliers in a high volume, high-speed data stream, we need to speed up the computation to keep up the rate of input that is stream data. The purpose of designing the Graphics Processing Units (GPUs) is to handle high parallel workloads also to execute thousands of threads which are concurrent. With the introduction of CUDA (Compute Unified Device Architecture) (Nair et al., 2011; Hudlicka, 2011; Zhang et al., 2017), it is also called as architecture for general purpose parallel computing and programming model, GPU computing became more and more popular in general-purpose data mining applications. To comply with the real-time processing requirements of streaming data, we use parallel processing powers of GPUs to accelerate kernel density estimation and generate timely manner results. As compare to multi-core implementation our experimental results show that method achieved 20 times higher speed on real-world datasets.

In the mechanism of outlier detection, sometimes the nature of data changes for prolong period of time. Hence it is very difficult to mark the behavior as outlier as soon as the behavior is detected. This gave rise to the mechanism of dividing the data into batches also called as windows. The portion of the data stream called window is analyzed for outlier behavior.

Once the detection algorithm finds the outlier, before marking it as an outlier behavior, it checks the series of windows before and after the outlier-window so that it makes easy to decide the outlier occurrence. It is also said that the entire data set over a period of time has to be considered before coming over to the final decision of abnormal behavior.

As shown in the Figure 1, there are 2 windows having cluster named as A, A’, B, B’ and H. As researchers see that in the first window, there are 2 clusters named as A and A’. As these clusters are separated from each other and has sufficient number of members, researchers cannot consider either of them as an outlier.

The second window also has the same case. The second window has B and B’. It would be very early to consider either of them as an outlier as the behavior of the network or the data generated over the network has a variation. This gave rise to the most dynamic solution of cumulative resulting where the behavior of both the results are considered together to identify the outlier or abnormal behavior.

Figure 1.

Cluster and Cumulative results

Once the cumulative result is obtained, the cluster named as H is neither collaborated with other clusters nor has sufficient members to treat it as a cluster. It leads to the final conclusion that the cluster H is an abnormal behavior indentified over the streamed data and can be considered as an outlier.

Complete Article List

Search this Journal:

Reset

Volume 11: 2 Issues (2020)

Volume 10: 2 Issues (2019)

Volume 9: 2 Issues (2018)

Volume 8: 2 Issues (2017)

Volume 7: 2 Issues (2016)

Volume 6: 2 Issues (2015)

Volume 5: 2 Issues (2014)

Volume 4: 2 Issues (2013)

Volume 3: 2 Issues (2012)

Volume 2: 2 Issues (2011)

Volume 1: 2 Issues (2010)

View Complete Journal Contents Listing

MLA

APA

Chicago

Export Reference

Parallel Outlier Detection for Streamed Data Using Non-Parameterized Approach

Abstract

Introduction

Complete Article List