A COMMUNITY APPROACH TO DATA INTEGRATION: BUILDING MEANINGFUL LINKS ACROSS DIVERSE DATASETS
(1) Syntactic integration: Legacy datasets are migrated for representation in the data structures described by the Archaeological Markup Language (ArchaeoML). (2) Semantic integration: Thesaurus relationships are established between related terms and classes in each source database. Different archaeological databases may use diverse human languages and different terminological and typological systems. A human expert must match terms between such datasets. The nuances of meaning in a given context are often very subtle.
XSTAR technologies enable multiple thesaurus mappings between diverse project datasets to coexist. Thus, the same information architecture can allow multiple and evolving data integration schemes to develop and keep pace with changing research agendas. No one integration scheme should be considered definitive because it is the result of potentially contestable theoretical judgments and interpretations.
These tools will help to integrate diverse datasets now often viewed in relative isolation. The tools to link datasets will create a forum for discussion and debate within the field of archaeology. Experts will be encouraged to explicitly define how the results of diverse excavation projects and surveys relate to each other. In our view, this will make an important contribution to the discipline, since the reasoning implicit behind diachronic and regional syntheses will be made much more transparent and open for evaluation. Similar such methods can be applied to several disciplines, including the environmental sciences and geology. Bringing together diverse datasets will likely catalyze innovative new research of greater scope and analytic rigor.