![]() |
XKeyword Homepage
|
| Main Idea | People | Publications | Presentations | Demo |
Keyword search is the most popular information discovery method because the user does not need to know either a query language or the underlying structure of the data. The search engines available today provide keyword search on top of sets of documents. In addition to documents, a huge amount of information is stored in databases, but no simple way is available to discover information in them, except by using structured languages (such as XQuery or SQL) where the type(s) of each keyword must be specified, as well as the types of connections between the objects. The result of a keyword query is a list of trees of nodes from the database, which is viewed as a labeled graph. For example, a keyword query “Smith, Miller” may return that “Smith” is the name of a sales agent who took an order from customer “Miller”, while a second solution may indicate that “Smith” and “Miller” are customers served by the “San Diego” sales unit.
As another example, consider a bibliographic database (see demo below), with publications, authors and conferences. Assume we want to know how author “Widom” is connected to author “Ullman”. Using a language like SQL, this is expressed as a set of queries of the type: the two authors have co-authored a paper, or the first references a paper of the second, or the second references a paper of the first, or they published in the same conference and so on. The same query, using XKeyword, is simply expressed as “Widom Ullman”.
XKeyword is built on a relational database and, hence, can accommodate very large graphs. Query evaluation is optimized by using the schema of the database. In particular, XKeyword consists of two stages. In the preprocessing stage a set of keyword indices are built along with indexed path relations that describe particular patterns of paths in the graph. In the query processing stage plans are developed that use a near optimal set of path relations to efficiently locate the keyword query results.
|
|
Yannis Papakonstantinou, Professor |
|
|
Andrey Balmin, PhD student |
|
|
Vagelis Hristidis, PhD student |
|
|
Tem Wang, MS student |
|
|
Patrick Lightbody, undergraduate student |
|
|
Vagelis Hristidis, Luis Gravano, Yannis Papakonstantinou: Efficient IR-Style Keyword Search over Relational Databases. VLDB, 2003 |
|
|
Andrey Balmin, Vagelis Hristidis, Nick Koudas, Yannis Papakonstantinou, Divesh Srivastava, Tianqiu Wang: A System for Keyword Search on XML Databases. Demo at VLDB, 2003 |
|
|
Vagelis Hristidis, Yannis Papakonstantinou, Andrey Balmin: Keyword Proximity Search on XML Graphs (extended version). ICDE, 2003 |
|
|
Vagelis Hristidis, Yannis Papakonstantinou: DISCOVER: Keyword Search in Relational Databases. VLDB, 2002 |
|
|
Efficient IR-style Keyword Search over Relational Databases, at VLDB 2003, Berlin |
|
|
XKeyword's presentation, at IEEE ICDE 2003, Bangalore, India |
|
|
DISCOVER's presentation, at VLDB 2002, Hong Kong. |
The XKeyword demo is operating on a subset of the DBLP database containing the 12 major conferences including VLDB, SIGMOD, ICDE, PODS, up to year 2001. The schema of the database is the following
The user inputs a set of keywords, which may belong to any type (relation), and the maximum Candidate Network size, which is the maximum size of the result trees in number of edges. For example, for the keyword query "Widom Ullman" the results include:
| Result | Size |
| author["Widom"] ← paper["TSIMMIS System"] → author["Ullman"] | 2 |
| author["Widom"] ← paper["Production Rules"] ← paper["Constraint Checking"] → author["Ullman"] | 3 |
| author["Widom"] ← paper["LORE"] → paper["TSIMMIS System"] → author["Ullman"] | 3 |
A thread is generated for every Candidate Network (CN), that is for every possible join tree on the schema containing all keywords. These threads execute in parallel and output results as they come. The threads of smaller CNs execute faster since they correspond to simpler queries. Hence the smaller results, which are intuitively the most relevant, are output first.
Inquiries for commercial use of this software should be directed to invent@ucsd.edu.