Special Offers
- IGI Global’s New Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 17 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no additional cost for multi-user licensing, no embargo of content, full-text PDF & HTML format, and more.
  Learn More
- Open Access Book (Free Access) - Encyclopedia of Information Science and Technology, Sixth Edition (ISBN: 9781668473665)
  The Encyclopedia of Information Science and Technology, Sixth Edition) continues the legacy set forth by the first five editions by providing comprehensive coverage and up-to-date definitions of the most important issues, concepts, and trends pertaining to technological advancements and information management within a variety of settings and industries. The entire book is being published under open access.
  Read Now
- Open Access Book (Free Access) - Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries (ISBN: 9781668456293)
  Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries provides information on the recent technology, mitigation, and environmental protection that must be applied for food sustainability in developing countries. This book is being published under Platinum Open Access through funding from Diponegoro University, Indonesia.
  Read Now
- Open Access Book (Free Access) - New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY (ISBN: 9781668438091)
  The Walmart Corporation and the Lumina Foundation have provided funding to make New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY fully open access, completely removing any paywall between scholars in education and the latest research on new models for the future of higher education.
  Read Now
- Open Access Book (Free Access) - Handbook of Research on the Global View of Open Access and Scholarly Communications (ISBN: 9781799898054)
  Through a collaboration between IGI Global and the University of North Texas, the Handbook of Research on the Global View of Open Access and Scholarly Communications has been published as fully open access, completely removing any paywall between researchers of any field, and the latest research on the equitable and inclusive nature of Open Access and all of its complications.
  Read Now
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Journals by Field
e-Collections
OnDemand
Open Access
- View All Open Access Opportunities
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through IGI Global’s Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open accesss endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global to publish your work under open access? Review IGI Global’s open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us

E-Government Documents and Data Clustering

Goran Šimić

Source Title: Handbook of Research on Democratic Strategies and Citizen-Centered E-Government Services

DOI: 10.4018/978-1-4666-7266-6.ch010

OnDemand:

(Individual Chapters)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

This chapter is about documents and data clustering as a process of preparing the information resources stored in the e-government systems for advanced search. These resources are mainly represented as textual data stored as field values in the databases or located as documents in file repositories. Due to their growth in number, search for some specific information takes more time. Different techniques are used for this purpose. Most of them include information retrieval based on a variety of text similarity measures. The cost of such processing depends on preparation of resources for searching. Clustering represents the most commonly used technique for such a purpose, and this fact is the basic motive for this chapter.

Chapter Preview

Top

Introduction

According to complexity of contemporary life, the governments try to establish close relations with citizens as much as possible in order to better understand their needs and problems, to offer them different kind of help and to satisfy their expectations. In such a situation the institutions offer the variety of Web services named e-government, or open government to make the information their systems already hold accessible to the people regardless of time and location. The information retrieval (IR) from big data collections stored in e-government information systems (IS) represents the important part of the solution. The data in such collections are heterogeneously structured and presented. Therefore, they can be hardly categorized depending on information they contain. The clustering represents the way for grouping the data based on their mutual similarity and without satisfactory descriptions provided by metadata self-contained in the documents and other content used.

Clustering can be performed on every kind of data: textual, visual and audio data as well as combination of these three. It is used especially for huge amount of data that are not well structured, or that are not structured at all. If it is not the case, some filtering and sorting are enough for preparing the data for information retrieval. Unfortunately, actual information systems are congested with different kind of content due to long time of data accumulation, their distributed nature, demands to exchange the data with the other systems, different types of data and various formats that the data are stored in. Therefore, the software developers faced a complex problem how to integrate the same system functions to be applicable on such heterogeneous content. One of the most important pieces in the solution of this problem is clustering.

Basically, clustering represents a process of grouping data by using some algorithm or mathematical function. In both cases the calculating of similarity between data represents the main principle.

The considerations in the chapter are mainly related to clustering of textual content. In e-government IS the data are commonly stored in databases while the documents can be held in both the databases (DB) and repositories. Generally, DB provides easier way for grouping data and retrieving the information. There are eight parts in chapter. After the background briefly presented, the basic concepts used in clustering are described. Further, the common measures such as text frequency and inverse document frequency commonly used in clustering are described there. Moreover, some modifications of them as well as their combination are explained. The third section is about the clustering taxonomies. Many of them could be found in the research papers and the most common approach is followed—hierarchical and partition clustering represents the basic classification. Another one is also important: ‘hard’ (discrete) and ‘soft’ (fuzzy) clustering. For clarity the considerations are richly illustrated with the examples. In the fourth section the clustering techniques and algorithms are described. Two important techniques are presented: K-means and Fuzzy C-means. The fifth section is about different formats and structures used for representing text content. The case study about clustering in ADVANSE system is presented after. Finally, the future plans and conclusions are presented in the last two sections.

Key Terms in this Chapter

Clustering: Unsupervised grouping of data.

Hierarchical Clustering: During the iterative process the clusters are formed either by splitting one into two new clusters or by merging two clusters into new one.

Soft Clustering: The item can belong to more than one cluster.

Partition Clustering: Based on predefined number the clusters are initially formed as 2D regions which change their shape during iterations based on using some of measures of central tendencies.

Inverse Document Frequency (IDF): Measure used in text content clustering.

Term Frequency (TF): One of the basic measures used in text content clustering.

SOM Clustering: Clustering based on Neural Networks principles by changing weights of connections between input and output nodes.

Complete Chapter List

Search this Book:

Reset

MLA

APA

Chicago

Export Reference

E-Government Documents and Data Clustering

Abstract

Introduction

Key Terms in this Chapter

Complete Chapter List