Automatic quality assessment of Web pages needs to complement human information work in the current situation of an information overload. Several systems for this task have been developed and evaluated. Automatic quality assessments are most often based on the features of a Web page itself or on external information. Promising results have been achieved by systems learning to associate human judgments with Web page features. Automatic evaluation of Internet resources according to various quality criteria is a new research field emerging from several disciplines. This chapter presents the most prominent systems and prototypes implemented so far and analyzes the knowledge sources exploited for these approaches.
Key Terms in this Chapter
Information Retrieval: Information retrieval is concerned with the representation and knowledge and subsequent search for relevant information within these knowledge sources. Information retrieval provides the technology behind search engines.
Latent Semantic Indexing: LSI is a dimensionality reduction technique for objects which are represented by large and sparsely populated vectors. The original vector space is formally transformed into a space with less but artificial dimensions. The new vector space has fewer dimensions and is an approximation of the original space.
Accessibility: Accessibility is a subfield of human-computer interaction and deals with users with deficiencies. These deficiencies mostly lie in the perception capabilities. For example, users who cannot see or hear as well as other require special consideration during the implementation of user interfaces.
Link Analysis: The links between pages on the Web are a large knowledge source which is exploited by link analysis algorithms for many ends. Many algorithms similar to PageRank determine a quality or authority score based on the number of incoming links of a page. Furthermore, link analysis is applied to identify thematically similar pages, Web communities and other social structures.
Machine Learning: Machine learning is a subfield of artificial intelligence which provides algorithms for the discovery of relations or rules in large data sets. Machine learning leads to functions which can automatically classify or categorize objects based on their features. Inductive learning from labeled examples is the most well known application.
Quality: In the context of information systems, quality describes the degree to which a product or service fulfills certain requirements. Quality measures the excellence of a product or system quality is usually is context dependent.
Human-Computer Interaction: HCI deals with the optimization of interfaces between human users and computing systems. Technology needs to be adapted to the properties and the needs of users. The knowledge sources available for this endeavor are guidelines, rules, standards and results from psychological research on the human perception and cognitive capabilities. Evaluation is necessary to validate the success of interfaces.