CS290S: Management of XML and (Semi)structured Data
Instructor: Wang-Chiew Tan
Email: wctan@cs.ucsc.edu
When: MW 5-6.45pm
Where: Stevenson 221
Office: BE359A
Office hours: By appointment only.
[ Course Description ] [ Grading Scheme ] [ Textbooks
] [ Syllabus ] [ Project
]
Course Description:
The widespread use of
XML data and the Web has brought about new demands on traditional database
engines. Traditional database engines, backed by more than 20 years of
research and engineering, are well-known for their efficiency in managing
large volumes of relational data (i.e., data that occurs in rigid table-like
structures). Their ability, however, to efficiently manage XML data,
which may not conform to a table structure, is still very much at the infancy.
In this course, we shall study various technical issues that arise in the
management of relational and XML data. This course covers the technical issues
through selected papers on various themes: Data integration, XML publishing
and storage systems, XML toolkit, XML Schema and queries, Data Exchange and
Peer-to-Peer systems, fundamentals of relational query evaluation,
and normalization of XML documents. It is also likely that there will
be one or two guest lecturers for this course.
Grading
Scheme:
Class participation:
10%
Paper presentations: 30%
Reviews: 30%
Project: 30%
- Students are expected
to participate actively in class and in the newsgroup/mailing list.
- Presentations: In
the entire course, each student shall present a paper in class.
- Paper Reviews: 5 paper
reviews, different from those you are designated to present, are expected.
You are free to write more reviews. Only the best 5 reviews will count towards
the grade.
- Project: Students
are also expected to complete a project. See Project
for details.
Textbooks:
There will be no textbooks
for this class though you can refer to the list of books below for additional
reading, in addition to your usual database course textbooks.
-
Data on the Web. Morgan Kaufmann. S. Abiteboul and P. Buneman
and D. Suciu
- Foundations of Databases.
Addison Wesley. S. Abiteboul and R. Hull and V. Vianu
Prerequisites:
By interview only. Please see the instructor. Note that CMPS221 is not
a prerequisite for this course.
Tentative
Syllabus:
- History,
various concepts (Object Exchange Model, UnQL, XML Data Model, some language
flavors in these data models) (1-2 lectures)
- [ pdf ] A Web
Odyssey - From Codd to XML - by Vianu (PODS'01)
- [ pdf
] Object Fusion in Mediator Systems - by Papakonstantinou, Abiteboul,
Garcia-Molina (VLDB'96)
- [ pdf ] A Data Transformation
System for Biological Data Sources - by Buneman, Davidson, Hart, Overton,
Wong (VLDB'95)
- [ ps
] SilkRoute: Trading Between Relations and XML - by Fernandez, Suciu,
Tan (WWW9)
- [ pdf ] Efficiently Publishing
Relational Data as XML Documents - by Shanmugasundaram, Shekita,
Barr, Carey, Lindsay, Pirahesh, Reinwald (VLDB '00)
- [
ps
] Storing Semistructured Data with STORED - by Deutsch, Fernandez, Suciu,
(SIGMOD '99)
- [ pdf ] Storing and Querying XML Data Using an RDBMS -
by Florescu, Kossman (IEEE Data Engineering Bulletin '99)
- [ pdf
] Information Integration using Logical Views - by Ullman (ICDT'97)
- [ ps
] Answering Queries Using Views: A Survey - by Halevy (VLDB Journal
'01)
- [ ps
] Processing XML Streams with Deterministic Automata - by Green, Miklau,
Onizuka, Suciu (ICDT '03)
- May 16 - CS Seminar. "Query Processing over Data Streams" [abstract]
- Mike Franklin, UC Berkeley
- [ pdf ]
XMLTK: An XML Toolkit for Scalable Stream Processing - by Avilla-Campillo,
Green, Gupta, Onizuka, Raven, Suciu (PlanX '02)
- [ pdf
] Translating Web Data - by Popa, Velegrakis, Miller, Hernandez, Fagin
(VLDB'02)
- May 21 [Lucian
Popa, IBM Almaden]
- [ pdf
] A Normal Form for XML Documents - by Arenas, Libkin (PODS'02)
- [ pdf
] An Information-Theoretic Approach to Normal Forms for Relational and
XML Data - by Arenas, Libkin (PODS'03)
- [ ps
] WOL: A Language for Database Transformation and Constraints - by Davidson,
Kosky (ICDE'97)
Project
The project for the course
can be one of the following options:
- A detailed study
of a database problem:
- You are expected to
conduct a detailed study of a few existing papers. Deliverable: a report
of length about 15 pages. This report should describe the problem that you
are studying, provide details of existing solution(s), and a critical analysis
of the solutions.
- An implementation and/or
experimental analysis of some proposed techniques in existing literature:
- Deliverables: You are
expected to be able to demo your implementation to the instructor and
provide experimental analysis on the performance of your implemenation, if
necessary. A 5-page report on your implementation is also expected.
- It can be a novel solution
to some existing database problem:
- Deliverables: Please
discuss this with the instructor.
Please see the
instructor for some project suggestions. You are also free to propose your
own project but you should discuss your project proposal with the instructor
first.