2004 Denver Annual Meeting (November 7–10, 2004)

Paper No. 10
Presentation Time: 10:35 AM

A COMMUNITY APPROACH TO DATA INTEGRATION: BUILDING MEANINGFUL LINKS ACROSS DIVERSE DATASETS


KANSA, Eric Christopher, The Alexandria Archive Institute and Stanford Univ, 135 El Verano Way, San Francisco, CA 94127, ekansa@alexandriaarchive.org

Heterogeneity is common to databases generated in the social sciences, humanities and some environmental sciences. The data, authors, research methods, uses, and audiences of many datasets are all highly diverse. Such heterogeneity requires database integration to be a theoretically informed interpretive process. The University of Chicago’s XSTAR project is leading development of new methods of collaborative, community-based data integration for the field of archaeology. With XSTAR, data integration takes place in two steps:

(1) Syntactic integration: Legacy datasets are migrated for representation in the data structures described by the Archaeological Markup Language (ArchaeoML). (2) Semantic integration: Thesaurus relationships are established between related terms and “classes” in each source database. Different archaeological databases may use diverse human languages and different terminological and typological systems. A human expert must match terms between such datasets. The nuances of meaning in a given context are often very subtle.

XSTAR technologies enable multiple thesaurus “mappings” between diverse project datasets to coexist. Thus, the same information architecture can allow multiple and evolving data integration schemes to develop and keep pace with changing research agendas. No one integration scheme should be considered “definitive” because it is the result of potentially contestable theoretical judgments and interpretations.

These tools will help to integrate diverse datasets now often viewed in relative isolation. The tools to link datasets will create a forum for discussion and debate within the field of archaeology. Experts will be encouraged to explicitly define how the results of diverse excavation projects and surveys relate to each other. In our view, this will make an important contribution to the discipline, since the reasoning implicit behind diachronic and regional syntheses will be made much more transparent and open for evaluation. Similar such methods can be applied to several disciplines, including the environmental sciences and geology. Bringing together diverse datasets will likely catalyze innovative new research of greater scope and analytic rigor.