Pattern Matching Techniques to Identify Syntactic Variations of Tags in Folksonomies

F. Echarte; J. J. Astrain; A. Córdoba; J. Villadangos

doi:10.4018/978-1-60566-272-5.ch011

Special Offers
- IGI Global’s New Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 17 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no additional cost for multi-user licensing, no embargo of content, full-text PDF & HTML format, and more.
  Learn More
- Open Access Book (Free Access) - Encyclopedia of Information Science and Technology, Sixth Edition (ISBN: 9781668473665)
  The Encyclopedia of Information Science and Technology, Sixth Edition) continues the legacy set forth by the first five editions by providing comprehensive coverage and up-to-date definitions of the most important issues, concepts, and trends pertaining to technological advancements and information management within a variety of settings and industries. The entire book is being published under open access.
  Read Now
- Open Access Book (Free Access) - Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries (ISBN: 9781668456293)
  Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries provides information on the recent technology, mitigation, and environmental protection that must be applied for food sustainability in developing countries. This book is being published under Platinum Open Access through funding from Diponegoro University, Indonesia.
  Read Now
- Open Access Book (Free Access) - New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY (ISBN: 9781668438091)
  The Walmart Corporation and the Lumina Foundation have provided funding to make New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY fully open access, completely removing any paywall between scholars in education and the latest research on new models for the future of higher education.
  Read Now
- Open Access Book (Free Access) - Handbook of Research on the Global View of Open Access and Scholarly Communications (ISBN: 9781799898054)
  Through a collaboration between IGI Global and the University of North Texas, the Handbook of Research on the Global View of Open Access and Scholarly Communications has been published as fully open access, completely removing any paywall between researchers of any field, and the latest research on the equitable and inclusive nature of Open Access and all of its complications.
  Read Now
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Journals by Field
e-Collections
OnDemand
Open Access
- View All Open Access Opportunities
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through IGI Global’s Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open accesss endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global to publish your work under open access? Review IGI Global’s open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us

Pattern Matching Techniques to Identify Syntactic Variations of Tags in Folksonomies

F. Echarte, J. J. Astrain, A. Córdoba, J. Villadangos

Source Title: Social Web Evolution: Integrating Semantic Applications and Web 2.0 Technologies

DOI: 10.4018/978-1-60566-272-5.ch011

OnDemand:

(Individual Chapters)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

Folksonomies offer an easy method to organize information in the current Web. This fact and their collaborative features have derived in an extensive involvement in many Social Web projects. However they present important drawbacks regarding their limited exploring and searching capabilities, in contrast with other methods as taxonomies, thesauruses and ontologies. One of these drawbacks is an effect of its flexibility for tagging, producing frequently multiple syntactic variations of a same tag. In this chapter we study the application of two classical pattern matching techniques, Levenshtein distance for the imperfect string matching and Hamming distance for the perfect string matching, to identify syntactic variations of tags.

Chapter Preview

Top

Introduction

Folksonomies (Vander Wal, 2008) are based in the assignation of text tags to different resources, such as photos, web pages, documents, etc., in order to classify these resources in Web 2.0. Users use these tags to annotate resources defining collaboratively the meaning of the annotated resources, and the used tags.

New search and exploration approaches are possible with Folksnomies, based on the use of the tags (Millen, 2006; Golder, 2005). Users can search for tags, or use navigation systems such as clouds of words, to locate resources tagged by other users and to find information.

Though folksonomies have a great success in current web, mainly due to their simplicity of use, they have also important disadvantages. The fact of users creating tags and assigning them freely to resources produces the inexistence of any structure among these tags. As folksonomies become larger, more problems appear regarding the use of synonyms, syntactic tag variations and different granularity levels (Gruber, 1993). All these problems make more and more difficult the exploration and retrieval of information (Mathes, 2004; Guy, 2006) decreasing the quality of folksonomies Thus, the reduction of syntactic tag variations aids to improve the quality of folksonomies.

There exist different types of syntactic variations of tags: typographical misspellings in the annotation process (semanticweb/semnticwev/zemantcweb); grammatical number (singular or plural) of the same word (semanticweb/ semanticwebs); separators (semantic-web/semanticweb); or a combination of them (semntic-web/smanticweb, semntic-webs, etc.). The existence of these variations causes the classification of the resources under different tags, when they should be classified under just one. This fact makes more confusing the clouds of words, the location of information and the navigation on the folksonomy. However, by identifying all of them as variations of the same label “semantic web” and grouping them under the same tag, a user can access this tag obtaining all the information concerning the resources associated with it and its syntactic variations.

This chapter focuses on the application of pattern matching techniques to identify syntactic tag variations. We propose the utilization of pattern matching techniques to identify syntactic variations of tags. We study two classical pattern matching techniques as Levenshtein (Levenshtein, 1966) and Hamming (Hamming, 1950) distances on a large real dataset, evaluating how these techniques perform the identification of both variations of known tags and new (non-existing) tags.

We show the percentages of correct identification achieved with each distance considering different types of variations, as typographic errors, transpositions of adjacent characters, singulars and plurals, and substitution/deletion of separators.

To our knowledge, there is not any study about the application of pattern matching techniques to the identification of syntactic variations of tags. Only in (Specia, 2007) a pre-filtering of the tags is performed before applying an algorithm for tag clustering. This is used to minimize the effects of syntactic variations and to increase the quality of tag clustering. Authors group similar tags using the Levenshtein similarity metric to determine morphological variations, although over a reduced experimental data set and following a non in detail described process. Another way to represent these variations is presented in (Gruber, 1993) where a ontology with three properties associated to tags (prefLabel, altLabel and hiddenLabel) is used.

The use of pattern matching techniques designed to automatically recognize syntactic variations of tags provides mechanisms to improve the quality of folksonomies.

Approximate string matching techniques allow dealing with the problem introduced by syntactic variations on folksonomies. The problem consists on the comparison of a candidate input string called α, maybe containing errors, and a pattern string ω in order to transform α in ω (Navarro, 2001).

Complete Chapter List

Search this Book:

Reset

MLA

APA

Chicago

Export Reference

Pattern Matching Techniques to Identify Syntactic Variations of Tags in Folksonomies

Abstract

Introduction

Complete Chapter List