Top1. Introduction
Keyword query is becoming a very popular way to obtain the information from the relational database along with its wide spread use on the Web. In real applications, however, most of common Web database users usually have insufficient knowledge about the database content and schema, and they are also lack of keywords related to the searching domain. Thus, it is not easy for them to find appropriate keywords to express their query intentions. To explore the database, the user may issue a query with a few general keywords at first, and then gradually refines the query through observing the query results. In such an iteration, the user needs to check each result to identify whether it is related to his interest or not, which is a time-consuming and tedious work.
Consider a DBLP database consisting of 3 relations connected through primary-foreign-key relationships shown in Figure 1.
Figure 1. An example of DBLP database
Suppose a master student who is a XML beginner just knows a few keywords about XML research field and wants to find chapters about the XML search techniques from DBLP website. Based on the DBLP database, he/she would issue a query Q containing keywords “XML, search”. On receiving the query Q, the traditional keyword search approach will return a set of minimal total joint networks (MTJNTs), each of which
- 1.
Is obtained from a single relation or by joining several relations, and
- 2.
Contains all the query keywords.
Since there are too many chapters containing keywords “XML” and “search” in DBLP dataset, there are too many MTJNTs in the query results. In such a case, the user would like the system suggest a list of keywords that are semantically related to Q in order to reduce the searching scope. From Figure 1, it is clearly that the author “Jeffrey” and keywords “XPath”, “XQuery”, and “twig pattern” are very relevant to Q. That means these terms can refine Q to formulate a more selective query. As an example, the user would execute a query Q’=[Jeffrey, XML, search] to retrieve only the chapters of author Jeffrey on XML searching and the query results are “a1w1p1” and “a1w2p4”. Additionally, the tuples p2 and p3 containing “full-text”, “semi-structured data”, and “twig pattern” are also related to the query Q. While, these tuples would not be returned by the system due to the terms they contained are not specified explicitly by the user query. If the user is also interested in these topics, he/she can choose the keyword “full-text”, “semi-structured data”, and/or “twig pattern” to explore the database. Hence, it is necessary to provide a list of semantically related terms to the given query and then the user can refine or reformulate his/her query according to the terms in the list.