A Web Metadata Based-Model for Information Quality Prediction
Ricardo Barros (COPPE - Federal University of Rio de Janeiro, Brazil), Geraldo Xexéo (COPPE - Federal University of Rio de Janeiro, Brazil), Wallace A. Pinheiro (COPPE - Federal University of Rio de Janeiro, Brazil) and Jano de Souza (COPPE - Federal University of Rio de Janeiro, Brazil)
Copyright: © 2008
Currently, in the Web environment, users have to deal with an enormous amount of information. In a Web search, they often receive useless, replicated, outdated, or false data, which, at first, they have no means to assess. Web search engines provide good examples of these problems: As reply from these mechanisms, users usually find links to replicated or conflicting information. Further, in these cases, information is spread out among heterogeneous and unrelated data sources, that normally present different information quality approaches. This chapter addresses those issues by proposing a Web Metadata-Based Model to evaluate and recommend Web pages based on their information quality, as predicted by their metadata. We adopt a fuzzy theory approach to obtain the values of quality dimensions from metadata values and to evaluate the quality of information, taking advantage of fuzzy logic’s ability to capture humans’ imprecise knowledge and deal with different concepts.
Key Terms in this Chapter
Fuzzy Sets: are an extension of classical set theory and are used in fuzzy logic. In classical set theory, the membership of elements in relation to a set is assessed in binary terms according to a crisp condition: An element either belongs or does not belong to the set. By contrast, fuzzy set theory permits the gradual assessment of the membership of elements in relation to a set (Klir & Yuan, 1995).
Context: and Contextualization specify a scope or a boundary for a knowledge domain (Lee, 2004).
Fuzzy Logic: is derived from fuzzy set theory dealing with reasoning that is approximate rather than precisely deduced from classical predicate logic. It can be thought of as the application side of fuzzy set theory dealing with well-thought-out real-world expert values for a complex problem (Klir & Yuan, 1995).
Metadata: is data about data, and Web metadata “is machine-understandable description of things on (and about) the Web”18.
Quality dimensions: represent the adopted information quality evaluation criteria and factors able to represent users’ quality expectations (Pipino et al., 2002).
PICS™: specification enables labels (metadata) to be associated with Internet content. It was originally designed to help parents and teachers control what children access on the Internet, but it also facilitates other uses for labels, including code signing and privacy. The PICS platform is one on which other rating services and filtering software have been built.
Information Quality: is a multidimensional concept, since users must deal with both subjective perceptions of the individuals involved with the data, and objective measurements based on the dataset under evaluation (Pipino et al., 2002).