Special Offers
- IGI Global’s New Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 17 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no additional cost for multi-user licensing, no embargo of content, full-text PDF & HTML format, and more.
  Learn More
- Open Access Book (Free Access) - Encyclopedia of Information Science and Technology, Sixth Edition (ISBN: 9781668473665)
  The Encyclopedia of Information Science and Technology, Sixth Edition) continues the legacy set forth by the first five editions by providing comprehensive coverage and up-to-date definitions of the most important issues, concepts, and trends pertaining to technological advancements and information management within a variety of settings and industries. The entire book is being published under open access.
  Read Now
- Open Access Book (Free Access) - Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries (ISBN: 9781668456293)
  Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries provides information on the recent technology, mitigation, and environmental protection that must be applied for food sustainability in developing countries. This book is being published under Platinum Open Access through funding from Diponegoro University, Indonesia.
  Read Now
- Open Access Book (Free Access) - New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY (ISBN: 9781668438091)
  The Walmart Corporation and the Lumina Foundation have provided funding to make New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY fully open access, completely removing any paywall between scholars in education and the latest research on new models for the future of higher education.
  Read Now
- Open Access Book (Free Access) - Handbook of Research on the Global View of Open Access and Scholarly Communications (ISBN: 9781799898054)
  Through a collaboration between IGI Global and the University of North Texas, the Handbook of Research on the Global View of Open Access and Scholarly Communications has been published as fully open access, completely removing any paywall between researchers of any field, and the latest research on the equitable and inclusive nature of Open Access and all of its complications.
  Read Now
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Journals by Field
e-Collections
Open Access
- View All Open Access Opportunities
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through IGI Global’s Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open accesss endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global to publish your work under open access? Review IGI Global’s open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us
Newsroom

Density-Based Clustering Method for Trends Analysis Using Evolving Data Stream

Umesh Kokate, Arviand V. Deshpande, Parikshit N. Mahalle

Source Title: International Journal of Synthetic Emotions (IJSE) 11(2)

DOI: 10.4018/IJSE.2020070102

OnDemand:

(Individual Articles)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

Evolution of data in the data stream environment generates patterns at different time instances. The cluster formation changes with respect to time because of the behaviour and members of clusters. Data stream clustering (DSC) allows us to investigate the changes of the group behaviour. These changes in the behaviour of the group members over time lead to formation of new clusters and may make old clusters extinct. Also, these extinct old clusters may recur over time. The problem is to identify and record these change patterns of evolving data streams. The knowledge obtained from these change patterns is then used for trends analysis over evolving data streams. In order to address this flexible clustering requirement, density-based clustering method is proposed to dynamically cluster evolving data streams. The decay factor identifies formation of new clusters and diminishing of older clusters on arrival of data points. This indicates trends in evolving data streams.

Article Preview

Top

Introduction

Nowadays huge data is generated across the various domains in real time, which is high-dimension in nature. Multi-dimensional data streams are generated by most of the applications deployed for whether monitoring, stock trading, telecommunication, network intrusion detection, remotely sense data of planets, and tools for analysis of web. The data streams have temporal order and can only be scan only once (Guha, S. et al., 1998; Yang, J., 2003). There has been active research regarding storage, query and analysis of evolving data streams.

Clustering is one of the major tasks in data mining. Data Stream clustering which is ordered sequence with respect to time-stamped data points in multi-dimension is considered. Data stream clustering has more issues and challenges as compared to traditional data clustering. The challenges are like; data can be scanned and examined in only one pass as data arrive in streams. In many applications, it is essential to know evolving nature of data rather than representing clusters for whole data stream. In most of the cases, data streams were considered as continuous model of static data and implemented clustering algorithms using single-phase (Stonebraker, M. et al, 1993). Such algorithms divides the whole set of data stream into batches and most of them uses k-means clustering algorithms in this finite batch of data (Guha, S. and Mishra, N., 2016; O'callaghan, L. et al., 2002). These algorithms were not in a position to identify the evolving characteristics of data stream. Some of the algorithms try to solve this issue by deploying moving window technique. This again gives partial results in most of the cases (Guha, S. and Mishra, N., 2016; O'callaghan, L. et al., 2002).

Data stream clustering methods proposed by (Aggarwal, C.C. et al., 2004) implemented data stream clustering using two-phase methods, online and offline methods. During online phase data stream is quickly processed and statistical summary is calculated and then during offline phase the same summary is used to generate clusters. The methodology and procedures regarding division of time horizon and statistics management are implemented. This is shown in CluStream (Guha, S. et al., 1998). Most of the data stream algorithms are using two-phase approach similar to CluStream. Semi-Partitioning method is deployed for improved offline phase by (Wang, Z. et al., 2004). Clustering of set of data streams as well as distributed data streams as an extension of work is also mentioned. As CluStream and related algorithms uses k-means method during offline phase, there are number of limitations such as, k-means identify only spherical clusters and not able to detect arbitrary shape clusters, k-means algorithm may not able to detect noise or outliers effectively, it requires number of scans of data, and thus it is not possible to apply directly to large volume of data stream. In CluStream algorithm online phase processes raw data to generate micro-clusters, and these clusters are then used as basic elements during offline phase for further refinement of clusters.

Clustering of data stream using density-based strategy has been widely used and another major methodology in clustering algorithms. In density-based clustering it is possible to identify arbitrary shaped clusters, it can remove noise or outliers and it is possible to scan data only once in order to examine raw data. This method is natural and referred as basic clustering technique for data stream clustering application. As compared to k-means methods density-based clustering does not require prior knowledge of number of probable clusters. DenStream (Cao, F. et al., 2006) algorithm was proposed which calculate density of each data points, and based of certain threshold values the data points are grouped to form a cluster. This requires two phases to implement the clusters. During First Phase, on-line computations are carried out in orders to gather statistical information, this step should be quick and fast as evolving nature of the data stream does not allow to retain the data records for much more time, thus micro-clusters are formed. During Second phase, off-line processing is performed on micro-clusters in order to generate macro-clusters, this leads to formation of arbitrary shape clusters.

In this research work, we propose algorithms to identify trends in evolving data streams which uses D-Stream algorithm (Chen, Y. and Tu, L., 2007), which is a density grid-based clustering framework for data streams. In k-means algorithm, data stream is considered as long sequence of static data set, but the main interest lies in identifying evolving patterns or trends in case of temporal feature of the data stream. The concept of decay factor with respect to the density of data points is introduced for detecting dynamic nature of clusters.

Complete Article List

Search this Journal:

Reset

Volume 11: 2 Issues (2020)

Volume 10: 2 Issues (2019)

Volume 9: 2 Issues (2018)

Volume 8: 2 Issues (2017)

Volume 7: 2 Issues (2016)

Volume 6: 2 Issues (2015)

Volume 5: 2 Issues (2014)

Volume 4: 2 Issues (2013)

Volume 3: 2 Issues (2012)

Volume 2: 2 Issues (2011)

Volume 1: 2 Issues (2010)

View Complete Journal Contents Listing

MLA

APA

Chicago

Export Reference

Density-Based Clustering Method for Trends Analysis Using Evolving Data Stream

Abstract

Introduction

Complete Article List