CMPS 278 Projects
Surveys
A survey should involve a topic that is not covered in the
course. The following list presents some suggested topics, but you
can also propose your own (as long as it adheres to the previous
rule).
- Similarity search over graph-structured
data. Graph Databases, that store and process
large collections of graph-structured data, are becoming
increasingly popular, especially with the rapid growth of
biological sciences (e.g., genome research) and the advent of
XML. Similarity queries offer a powerful querying paradigm in
this context, as they allow the retrieval of graphs that
'resemble" a specific input pattern. The goal of
this project is to survey techniques for performing similarity
search over large collections of graph-structured data.
- The gap between academia and industry. How do modern
commercial systems handle the topics that we cover in this
course? What techniques are implemented and what compromises
have been made?
Implementation
An implementation project can involve the experimental evaluation
of existing techniques, an extension of a previously proposed
technique, or its application in a new context. In either case,
an implementation project must have a strong research
component. If you are already doing research, then you can
propose a project that is relevant to your work as long as it
involves a database-related problem.
- XML-based relational synopses. A key problem in
relational query optimization is estimating accurately the
result cardinalities of different relational algebra
expressions. Typically, the system computes these estimates by
essentially evaluating the expression over a concise
statistical summary of the base data. This project
will investigate the application of XML-based
techniques in order to build an effective summary of the
relational database. The key idea is that a relational
database can be modeled as a graph structure, where tuples are
nodes and the join relationships become the edges. This approach
adopts a semi-structured view of relational data and enables
the application of XML-based techniques in order to summarize
the join structure of the database.
- Sketching techniques for XML data. Randomized
sketching techniques, that have been developed in the context of
relational databases, can build a concise statistical summary
of the base data by performing a single scan over the
database. The goal of this project is to investigate the
application of these techniques in the domain of XML databases.
- XML Sampling. Sampling can provide a concise
statistical summary of the base data at a low cost. Even
though sampling has been studied thoroughly in the relational
context, not much work has been done in the XML domain. The
goal of this project will be to investigate the application of
relational sampling techniques in the context of XML
databases.
- Speculative Partial Indexes. In speculative query
processing, the database predicts the characteristics of the
upcoming query workload and uses spare cycles in order to
prepare itself. The goal of this project will be to
investigate the application of this idea in the context of
partial index creation. A partial index covers only a specific
range in the underlying value domain and is thus faster to
process, but also less general. The project will involve the
design and development of a speculative subsystem that
observes the current query workload, predicts the
characteristics of upcoming queries, and dynamically builds
partial indexes that will speed-up query processing.
- Experimental evaluation of different XML summarization
techniques. XML summarization has become an important
problem in the realm of XML query processing, and a host of
different techniques have been proposed. The goal of this
project is to perform a thorough and methodical experimental
study in order to compare the performance of several different
techniques.
Neoklis Polyzotis
Last modified: Tue Jan 4 09:12:54 PST 2005