Towards Disambiguating Social Tagging Systems

Towards Disambiguating Social Tagging Systems

Antonina Dattolo (University of Udine, Italy), Silvia Duca (University of Bologna, Italy), Francesca Tomasi (University of Bologna, Italy) and Fabio Vitali (University of Bologna, Italy)
DOI: 10.4018/978-1-60566-384-5.ch020
OnDemand PDF Download:
$30.00
List Price: $37.50

Abstract

Social tagging to annotate resources represents one of the innovative aspects introduced with Web 2.0 and the new challenges of the (semantic) Web 3.0. Social tagging, also known as user-generated keywords or folksonomies, implies that keywords, from an arbitrarily large and uncontrolled vocabulary, are used by a large community of readers to describe resources. Despite undeniable success and usefulness of social tagging systems, they also suffer from some drawbacks: the proliferation of social tags, coming as they are from an unrestricted vocabulary leads to ambiguity when determining their intended meaning; the lack of predefined schemas or structures for inserting metadata leads to confusions as to their roles and justification; and the flatness of the structure of the keywords and lack of relationships among them imply difficulties in relating different keywords when they describe the same or similar concepts. So in order to increase precision, in the searches and classifications made possible by folksonomies, some experiences and results from formal classification and subjecting systems are considered, in order to help solve, if not to prevent altogether, the ambiguities that are intrinsic in such systems. Some successful and not so successful approaches as proposed in the scientific literature are discussed, and a few more are introduced here to further help dealing with special cases. In particular, we believe that adding depth and structure to the terms used in folksonomies could help in word sense disambiguation, as well as correctly identifying and classifying proper names, metaphors, and slang words when used as social tags.
Chapter Preview
Top

Introduction

The purpose of this chapter is to introduce the reader to the problems of extracting meaningful, organized information from user-generated folksonomies, and to expose a number of limitations in the current approaches that will need to be solved in the immediate future.

In the Web 2.0 era, social tagging is a concept used to refer to the activity of a large number of human readers who associate descriptive terms (often called tags) to Web resources they are reading or searching; no rules, restrictions, and not even suggestions are usually offered to readers when generating tags for these resources, in order to maintain the spontaneity and statistically-relevant frequency of use of the terms thought of by real people. The tags actually entered are then analysed through statistical tools to help other users, that use the same terms, to find the same documents. Folksonomies in this context are the classifications of Web resources emerging from the identification of the statistical prominence of some tags over the others.

On the other hand, traditional document classification methods (both on the Web and on printed collections) have preferred stricter and more precise methods for subjecting and classification. Enumerative systems, taxonomies, thesauri and ontologies are generated by dedicated (and human) professionals; they provide construction rules for the classification (at least a controlled vocabulary) and then painstakingly read, digest reflect on the document content and add manually metadata values. These values match both the content of the documents themselves and the expectations and slant of the collection in which the document ends.

Although the manual process usually reaches high quality levels of classification for traditional document collections, it does not scale to the humongous size of the Web, both in terms of costs, time, and expertise of the human personnel required, and as such it cannot be proficiently put into existence for the whole Web.

If the generation of a complete classification system, using a third party army of professionals, is inappropriate and hard to scale, even the alternative approach of author-created metadata falls short of another important issue, namely, the fact that the intended and unintended users of the information are disconnected from the classification process (Mathes, 2004).

On the other hand, social tagging (i.e., reader-created metadata) deals with this limitation: the added value offered from folksonomies is that this operation is entrusted to the mass actions of the readers themselves, that naturally average the extremes and coalesce on a limited numbers of terms that most probably will be the same used by subsequent users searching the same documents. Pioneered by Web social bookmarking services (such as Del.icio.us, http://delicious.com/; Digg, http://digg.com/; Furl, http://www.flickr.com/), folksonomies contribute to add not just information to resources, but concretely relevant information to resources. The list of tags, however unconstrained and subjective, used by individual readers to describe a document, after reaching a critical mass, tend to cluster around particularly frequent terms that become the most meaningful ones that could be used, have been used and will be used to describe that document. Thus final users are not only connected to the classification process, but they in fact are the main actors of the classification process.

Of course this flexibility comes at a price: social tagging does not handle issues that are easily handled by previous classification methods:

Key Terms in this Chapter

Web 2.0: Web 2.0 is the popular term for advanced Internet technology and applications including blogs, wikis, RSS and social bookmarking. The expression was originally coined by O'Reilly Media and MediaLive International in 2004, following a conference dealing with next-generation Web concepts and issues

Tags: A tag is a generic term for a language element descriptor. The set of tags for a document or other unit of information is sometimes referred to as markup, a term that dates to pre-computer days when writers and copy editors marked up document elements with copy editing symbols or shorthand

Web 3.0: Web 3.0 is defined as the creation of high-quality content and services produced by gifted individuals using Web 2.0 technology as an enabling platform. Web 3.0 refers to specific technologies that should be able to create the Semantic Web.

Folksonomies: Folksonomy is the result of personal free tagging of information and objects (anything with a URL) for one's own retrieval. The tagging is done in a social environment (usually shared and open to others). Folksonomy is created from the act of tagging by the person consuming the information.

Taxonomies: Taxonomy is the science of classification according to a pre-determined system, with the resulting catalogue used to provide a conceptual framework for discussion, analysis, or information retrieval. In theory, the development of a good taxonomy takes into account the importance of separating elements of a group (taxon) into subgroups (taxa) that are mutually exclusive, unambiguous, and taken together, include all possibilities

Ontologies: Definition (computer_science): An ontology is a collection of concepts and relations among them, based on the principles of classes, identified by categories, properties that are different aspects of the class and instances that are the things

and categorization: The basic cognitive process of arranging into classes or categories. The word classification identifies especially the system used in libraries for describe, with a specific notation, the content of a book. Categorization is a more theoretical theory

Thesaurus: A thesaurus is the vocabulary of an indexing language, that is a controlled list of accepted terms. The role of a thesaurus is to specify a preferred term (descriptor) to be use in indexing and to establish relationships between concepts at different levels: define synonyms, specify hierarchies, individuate related terms

Metadata: Data that describes other data. The term may refer to detailed compilations such as data dictionaries and repositories that provide a substantial amount of information about each data element. It may also refer to any descriptive item about data, such as a title field in a media file, a field of key words in a written article or the content in a meta tag in an HTML page

Complete Chapter List

Search this Book:
Reset