Abstract
Fuzzy query processing for XML database systems is an important issue. Based on the fuzzy set theory, XML fuzzy query can be expressed exploiting fuzzy predicates. To deal with the ranking problem of XML fuzzy query results, this chapter proposes a novel ranking approach. Firstly, according to the workload of XML documents, this chapter speculates how much the users care about each attribute node and assign a corresponding weight to it. Then, a membership degree ranking method, which ranks the fuzzy query results according to corresponding membership degree, is presented. Furthermore, this chapter proposes the probabilistic ranking method, which improves the PIR method. The improved probabilistic ranking method considers the relevance between the nodes specified by fuzzy query and the nodes unspecified by fuzzy query. Finally, top-k ranking algorithm of XML fuzzy query results is presented. The efficiency and effectiveness of the approach are also demonstrated by experimental results.
TopIntroduction
With the rapid development of the World Wide Web, the XML data has been recognized as a standard for data representation and transmission over the Internet. Many data have been produced and transformed into the XML data format, especially in scientific data and Web logs the information is kept in XML. Several query languages have been proposed until XPath and XQuery received a general consensus, becoming the standard query language. Traditional query processing method of XML data requires users to specify the precise queries, and the query processing systems return precise query results to the users. However, in many cases, the users can not express their query intentions accurately. Therefore, the traditional XML query processing systems lack flexibility to the users. The users would like to issue fuzzy query with fuzzy predicates, then XML database system is more convenient to the users.
For example, published information of the books is stored in an XML document. The user needs to search the books that were published in year around 2012 as well as the price is less than $60. In the XML document, the user may request the following fuzzy query:
//book[{@year close to 2012}][{@price at most 60}]
where, fuzzy predicates “close to” and “at most” express users’ fuzzy query intension.
In the practical application, the users don’t know exactly what they are looking for. When searching the XML document, the users often don’t know the XML Schema in detail. The XML data format has the structure and content information. The same data sometimes can be described using different structure. Therefore, the users’ query intensions are usually fuzzy or imprecise. How to extend XML query language as well as provide some flexibility to the users is an important issue. The XPath query language can select XML data nodes effectively through the expression of query traversal, then it able to formulate a query expression that accurately expresses users’ query intensions. This chapter uses fuzzy predicates to expend XPath query language, to make the fuzzy query expression to express users’ fuzzy query intension. Moreover, the users may propose the fuzzy query with fuzzy terms or fuzzy relations. Thus, the extensional fuzzy query expression can help users to improve their interaction with system in order to obtain more query results.
After fuzzy query of XML data, another problem faced by the users will be that how to use efficient ranking method to rank the fuzzy query results. This chapter uses the membership ranking method to rank the fuzzy query results according to their membership degree. For the nodes with the same membership degree, each node specified by fuzzy query is assigned a score according to its desirableness to the user. This chapter also considers the workload information of the XML documents, which is a log of past users’ queries. Based on the workload, the chapter speculates how much the user cares about each node and assigns a corresponding weight to the attribute nodes according to their importance.
This chapter presents an automatic estimative approach via workload as well as data analysis to resolve above problem. Fuzzy query and probabilistic ranking (FQPR) method developed the ranking functions, which are based on probabilistic information retrieval (PIR) model (Chaudhuri et al. 2004, 2006). The architecture of FQPR ranking model has a pre-processing component that collects workload statistics to determine the appropriate ranking scores. The extracted ranking scores are materialized in an intermediate knowledge representation layer, to be used later by a query processing component for ranking the fuzzy query results. Although FQPR ranking approach derives these quantities automatically, the architecture allows users or domain experts to tune these quantities further. Therefore, FQPR ranking model may customize the ranking functions for different applications.
The objectives of the chapter are summarized as follows:
- •
This chapter proposes an XML fuzzy query method, and use fuzzy predicates to expand XPath query language.
- •
In order to resolve the raking problem of fuzzy query results, the chapter proposes a novel membership degree ranking method, which can rank the fuzzy query results according to their membership degree.
- •
Based on workload statistics, the chapter presents probabilistic ranking method, which considers the relevance between the nodes specified by fuzzy query and the nodes unspecified by fuzzy query.
Key Terms in this Chapter
Fuzzy Relations: The fuzzy relations as operators and crisp values as operands. For A Y , where A is an attribute, is a fuzzy relation, and Y is a crisp value, Y is a fuzzy number. In this chapter three types of fuzzy relations, which are “ close to ( around )”, “ at least ”, and “ at most ”.
Compound Fuzzy Term: A compound fuzzy term such as “ young ? very young ” is represented by simple fuzzy terms or composite fuzzy terms connected by union (?), intersection (n) or complementation connectors.
XML Fuzzy Query: An XML document D with attributes A = { A 1 , A 2 , …, A m } and a fuzzy query Q = { Q 1 , Q 2 , …, Q s } over D , where each Q i can be a precise condition (such as “ A i ? a i ”) or a fuzzy condition (such as “ A i ? ” or “ A i a i ”), but at least one is the fuzzy condition.
XML: Extensible Markup Language (XML) is a markup language. XML defines a set of rules for encoding documents in a format.
Composite Fuzzy Term: A composite fuzzy term such as “ very young ” or “ more or less tall ” is described by a fuzzy number with membership function.
Fuzzy Set: Fuzzy sets are sets whose elements have degrees of membership. Fuzzy sets were introduced by Lotfi A. Zadeh in 1965 as an extension of the classical notion of set.
Simple Fuzzy Term: A simple fuzzy term such as “ young ” or “ tall ” is defined by a fuzzy number with membership function.
PIR Model: The model is a probabilistic information retrieval method, which has probabilistic ranking functions to rank query results of databases.