Advanced Techniques for Web Content Filtering
Elisa Bertino (Purdue University, USA), Elena Ferrari (Università degli Studi dell’Insubria, Italy), Andrea Perego (Università degli Studi dell’Insubria, Italy) and Gian Piero Zarri (Université Paris IV, France)
Copyright: © 2008
In this chapter, besides discussing the current strategies for Web content filtering, outlining their advantages and drawbacks, we present an approach, formerly developed in the framework of the EU project EUFORBIA, which, besides addressing the main drawbacks of the existing systems, can be applied for purposes comprising both users’ protection and quality assurance. The main features of such an approach are the support for multiple metadata vocabularies for the rating and filtering of Web resources, and the possibility of specifying policies which allow the system to decide whether a resource is appropriate or not for a given user based on his/her preferences and characteristics.
Key Terms in this Chapter
Metadata-Based Web Rating: A manual or semiautomatic description of Web resources with respect to their content and/or characteristics, based on a set of descriptors defined by a metadata vocabulary.
Metadata Vocabulary: A formal definition of a set of descriptors to be used for denoting the characteristics of resources (e.g., an ontology is a metadata vocabulary). Usually, metadata vocabularies are domain-specific.
Web Content Rating: The classification of Web resources with respect to their content and/or characteristics. Resource classification is performed by using two orthogonal strategies, namely, list-based Web rating and metadata-based Web rating.
Web Trust Mark: A third-party certification that a resource satisfies a given set of requirements (e.g., the VeriSign seal is a Web trust mark). A trust mark may be just a graphical symbol attached to the resource or even a Web content label.
List-Based Web Rating: A classification of Web resource with respect to their content and/or characteristics, into two distinct groups, corresponding to resources considered as appropriate (white lists) or inappropriate (black lists) for a given set of end users.
Web Content Filtering: The analysis of the content and/or characteristics of Web resource with respect to the preferences expressed by end users. Such analysis is performed based on the resource classification carried out either by a list- or metadata-based Web rating approach.
Web Content Label: A formal description of the content and/or characteristics of Web resources, by using descriptors defined in one or more metadata vocabularies. Currently, content labels are encoded by using two W3C standards, namely, PICS and RDF/OWL.