SEMANTIC INTEGRATION OF HETEROGENEOUS VOLCANIC AND ATMOSPHERIC DATA
The vast majority of explorations of the Earth system are limited in their ability to effectively explore the most important (often most difficult) problems because they are forced to interconnect at the data-element, or syntactic, level rather than at a higher scientific, or semantic, level. In many cases, syntax-only interoperability IS the state-of-the-art. Currently, in order for scientists and non-scientists to discover, access, and use data from unfamiliar sources, they are forced to learn details of the data schema, other people's naming schemes and syntax decisions. These constraints are limiting even when researchers are looking for information in their own discipline, but they present even greater challenges when researchers are looking for information spanning multiple disciplines, including some in which they are not extensively trained. Our project, the Semantically-Enabled Scientific Data Integration (hereafter SESDI), aims to demonstrate how ontologies implemented within existing distributed technology frameworks will provide essential, re-useable, and robust, support necessary for interdisciplinary scientific research activities.
Our project is aimed at enabling the next generation of interdisciplinary and discipline-specific data and information systems. Our initial focus is the integration of volcanology and atmospheric data sources in support of investigations into relationships between volcanic activity and global climate.
This work is aimed at providing scientists with the option of describing what they are looking for in terms that are meaningful and natural to them, instead of in a syntax that is not. The goal is not simply to facilitate search and retrieval, but also to provide an underlying framework that contains information about the semantics of the scientific terms used. Our system is expected to be used by scientists who want to do processing on the results of the integrated data, thus the system must provide access to how integration is done and what definitions it is using. The missing element in previous systems in enabling the higher-level semantic interconnections is the technology of ontologies, ontology-equipped tools, and semantically aware interfaces between science components. We present the initial results of using semantic technologies to integrate data between these two discipline areas to assist in establishing causal connections as well as exploring as yet unknown relationships.
Semantic Data Integration Methodology
<>Our effort depends on machine operational specifications of the science terms that are used in the disciplines of interest. We are following a methodology that we believe is yielding candidate reference ontologies in our chosen domains. We have identified specific ontology modules that need construction in the areas of volcanoes, plate tectonics, atmosphere, and climate. We have begun construction of two of the modules along with the help of a set of selected experts in the areas. Prior to a workshop, we identify a small set of subject matter experts. We also provide some background material for reading about ontology basics. Additionally, prior to our face to face meetings with experts, we identify foundational terms in the discipline and provide a simple starting point for organizing the basic terminology. While we do not want to influence the domain experts on their terminology, we find that we make more progress if we provide simple starting points using well agreed upon terminology. We then bring together a small group of the chosen domain experts and science ontology experts with a goal of generating an initial ontology containing the terms and phrases typically used by these experts. We use our task of researching the impact of volcanoes and global climate to focus the discussions to help determine scope and level of granularity.
<>We held a meeting with volcano experts and generated an initial ontology containing terms and phrases used to classify volcanoes, volcanic activities, and eruption phenomena. We use a relatively simple graphical tool (CMAP) for capturing the terms and their relationships. A portion of the initial volcano ontology is shown in Figure 1 (from Sinha, et al, 2006, McGuinness, et. al., 2006).
<>
<> Figure 1. Volcano Ontology CMAP fragment
<>Volcanoes can be classified by composition, tectonic setting, environmental setting, eruption type, activity, geologic setting, and landform. The ontology currently contains upper level terms in these areas and is being expanded according to the needs of the project and is being reviewed by additional domain experts. The initial focus is on gathering terms, putting them into a generalization hierarchy (using isa links in the diagram) and connecting the terms through properties (using has links in the diagram) as well as identifying equivalence relationships (using sameas links) and partonomic information (using ispartof links).
<>We held a second workshop to create a plate tectonics ontology. We identified domain experts and used the same science ontology experts as used in our volcano ontology meeting. The resulting terminology description is shown in Figure 2. We also focused in this meeting on gathering the primary class terms, e.g., plate boundary, lithosphere, etc., and putting them into a generalization hierarchy and identifying important properties relating the terms.
Figure 2. Plate Tectonics Ontology CMAP Fragment
<>Both in preparation for and in follow-up from the domain workshops, we are reviewing relevant existing vocabularies and ontologies. We have reused terminology from SWEET[1], GEON ontologies[2], and the Virtual Solar Terrestrial Observatory (McGuinness, et. al., 2006) instrument and observatory ontologies. We are also gathering some of the starting points for the atmosphere and climate ontologies from SWEET and related ontologies.
Adding semantics to integrate data sources
Our candidate datasets for registration with the developed ontologies and for use in our data integration use case include the WOrld VOlcano DATabase (WOVODAT ) in collaboration with Yellowstone researchers and the USGS and the Nevada Test site database in collaboration with Los Alamos researchers. We are presently identifying the corresponding atmospheric and climate record databases. Our approach will be to establish initially a web portal based on one of the current best of-breed semantic data frameworks (e.g. VSTO, GEON) to prototype the access to the data from the volcano and atmosphere disciplines separately. Then we will provide web service access to both sources to that the statistical application we will utilize for the data integration can query for and retrieve data.
Discussion and Conclusion
We have begun an effort that utilized ontologies to provide the capture term meanings in distinct but related science domains with a goal of facilitating research into relationships between the domains. We currently have starting points for reference ontologies in volcanoes and plate tectonics. We have also begun the homework on atmosphere and climate and will be holding workshops to generate reference ontologies with domain experts. We also will hold workshops to vet the ontologies among the multiple communities. Our findings so far are that our methodology for creating starting points for reference ontologies is working well in terms of gathering terms and relationships, and reaching agreement among the initial domain and science ontology experts.
Based on the successful use of semantics in data integration for the VSTO and GEON projects, our next step for SESDI is to articulate a use case that drives the way and type of data integration needed to solve a specific scientific problem. Our candidate is to examine the statistical relation between the height of the tropopause and related forcings. This height is very sensitive to forcing and in a way that the fingerprint of volcanic and (for example) solar forcings are very distinct.
Peter Fox, Deborah L. McGuinness, Don Middleton, Luca Cinquini, J. Anthony Darnell, Jose Garcia, Patrick West, James Benedict, and Stan Solomon. Semantically-Enabled Large-Scale Science Data Repositories. In the Proceedings of the Fifth International Semantic Web Conference, Athens, Ga, November 5-9, 2006.
Deborah McGuinness, Peter Fox, Luca Cinquini, Patrick West, Jose Garcia, James L. Benedict, and Don Middleton. The Virtual Solar-Terrestrial Observatory: A Deployed Semantic Web Application Case Study for Scientific Research. In the proceedings of the Nineteenth Conference on Innovative Applications of Artificial Intelligence (IAAI-07). Vancouver, British Columbia, Canada, July 22-26, 2007.
Deborah L. McGuinness, A. Krishna Sinha, Peter Fox, Rob Raskin, Grank Heiken, Calvin Barnes, Ken Wohletz, Dina Venezky, Kai Lin. Towards a Reference Volcano Ontology for Semantic Scientific Data Integration. American Geophysical Union Joint Assembly, Baltimore, Maryland, May 23-26, 2006.
Sinha, A.K., Heiken, G., Barnes, C., Wohletz, K., Venezky, D., Fox, P., McGuinness, D.L, Raskin, R., and Lin,K, 2006, Towards an ontology for Volcanoes, U.S.Geological Survey Scientific Investigations Report 2006-5201, p.51