Mining techniques can play an important role in automatic image classification and content-based retrieval. A novel method for image classification based on feature element through association rule mining is presented in this chapter. The effectiveness of this method comes from two sides. The visual meanings of images can be well captured by discrete feature elements. The associations between the description features and the image contents can be properly discovered with mining technology. Experiments with real images show that the new approach provides not only lower classification and retrieval error but also higher computation efficiency.
Along with the progress of imaging modality and the wide utility of digital image (include video) in various fields, many potential content producers have emerged, and many image databases have been built. In addition, the growth of Internet and storage capability not only increasingly makes images a widespread information format in World Wide Web (WWW), but also dramatically expands the number of images on WWW and makes the search of required images more complex and time consuming. To efficiently search images on WWW, effective image search engines need to be developed.
Since images require large amounts of storage space and processing time, how to quickly and efficiently access and manage these large, both in the sense of information contents and data volume, databases has become an urgent problem to solve. The research solution for this problem, using content-based image retrieval (CBIR) techniques, is initiated in the last decade (Kato, 1992). An international standard for multimedia content descriptions, MPEG-7, is also formed in 2001 (MPEG). With the advantages of comprehensive descriptions of image contents and consistence to human visual perception, research in this direction is considered as one of the hottest research points in the new century (Castelli, 2002; Zhang, 2003; Deb 2004; Zhang 2007).
Among the many research topics in CBIR, automatic image classification (categorization) plays an important role both for Web image searching and retrieving (classification and retrieval are closely related), as it is time consuming for users to browse through and treat the huge data on Web. A successful image classification will significantly enhance the performance of the content-based image retrieval system by filtering out images from irrelevant classes during matching. Classification has been used to provide access of large image collections with more efficient manner because the classification can reduce search space by filtering out the images in unrelated category (Hirata, 2000).
The heterogeneous nature of Web images makes their classification a challenge task. A functional classification scheme should take the contents of images in consideration. Web mining is a tool suitable for helping image classification and retrieval on the Web. It consists of (Scime, 2005):
Pre-processing: It is one of the most important steps in Web mining. It includes data purging, user recognition, dialog recognition, and event recognition.
Pattern discovering (Mining algorithm): It uses statistical analysis, association rule, clustering, and classification.
Pattern analysis: It transforms the rules, patterns and statistical values into knowledge. By using this knowledge, valuable patterns (interesting rules, patterns) can be obtained.
Traditional mining techniques often generate huge amounts of numeric data that could be difficult to interpret and use. Visual mining transforms raw data into visualization and makes it easier to understand the meaning of data and make suitable decisions, in addition to opening the world of visual tools to a much broader audience (Soukup, 2002).
Key Terms in this Chapter
Web Mining: Concerned with the mechanism for discovering the correlations among the references to various files that are available on the server by a given client. Each transaction is comprised of a set of URLs accessed by a client in one visit to the server.
Web Image Search Engine: A kind of search engines that start from several initially given URLs and extend from complex hyperlinks to collect images on the WWW. Web search engine is also known as Web crawler.
Content-Based Image Retrieval (CBIR): A process framework for efficiently retrieving images from a collection by similarity. The retrieval relies on extracting the appropriate characteristic quantities describing the desired contents of images. In addition, suitable querying, matching, indexing and searching techniques are required.
Classif ication Error: Error produced by incorrect classifications, which consists of two types: correct negative (wrongly classify an item belong to one class into another class) and false positive (wrongly classify an item from other classes into the current class)
Multi-Resolution Analysis: A process to treat a function (i.e., an image) at various levels of resolutions and/or approximations. In such a way, a complicated function could be divided into several simpler ones that can be studied separately.
Classif ication Rule Mining: A technique/procedure aims to discover a small set of rules in the database to form an accurate classifier for classification.
Similarity Transformation: A group of transformations that will preserve the angles between any two curves at their intersecting points. It is also called equi-form transformation, because it preserves form of curves. A planar similarity transformation has four degrees of freedom and they can be computed from two-point correspondence.
Pattern Recognition: Concerned with the classification of individual patterns into pre-specified classes (i.e., supervised pattern recognition), or with the identification and characterization of pattern classes (i.e., unsupervised pattern recognition).
Pattern Detection: Concerned with locating patterns in the database to maximize /minimize a response variable or minimize some classification error (i.e., supervised pattern detection), or with not only locating occurrences of the patterns in the database but also deciding whether such an occurrence is a pattern (i.e., unsupervised pattern detection).