|
A schema mapping is a high-level, declarative specification that describes how data structured under one schema (the
source schema) is to be transformed into data structured under a different schema (the target schema).
We developed a non-intrusive, data-driven approach for debugging schema mappings that
employs (test) data to drive the process of exploring, understanding and refining a schema mapping.
At the core of our approach lies the notion of routes.
Routes are a form of provenance, describing the relationship between
source and target data with the schema mapping. Routes have declarative semantics,
independent of the implementation of the data exchange engine; therefore, our
techniques apply to any schema mapping-based data exchange systems (also, data integration systems).
We have designed polynomial time algorithms for computing one and, respectively, all routes
for selected source or target data. The latter algorithm produces a complete, polynomial size representation of the
(possibly exponential) set of all routes. We have implemented our algorithms in a prototype system
called SPIDER (see main features
and demo).
SPIDER works on top of the
Clio schema mapping management system
from IBM Almaden Research Center and operates
with schema mappings based on a nested extension of tuple generating dependencies
and equality generating dependencies.)
The schema mappings used in our experiments can be found here.
Acknowledgements This work is supported in part by NSF CAREER Award IIS-0347065 and NSF grant IIS-0430994.
|