DIA ENGINE: SEMANTIC DISCOVERY, INTEGRATION, AND ANALYSIS OF EARTH SCIENCE DATA
Geoscientists have generated massive volumes of earth science data for decades. Most of the produced data, however, remain isolated knowledge islands; the ability to find, access, and properly interpret these large data repositories has been very limited. This is due to two main reasons: the absence of data sharing infrastructures that scientists may use to advertise their data, and lack of a common language that scientists may use to properly interpret other providers' data. As a result, the discovery, integration, and analysis of earth science data have remained difficult. This, in turn, has precluded the meaningful use of the available data in answering complex questions that require information from several data sources. To address this problem, we have developed DIA, a service-oriented, Web-based computational infrastructure that enables scientists to utilize semantically enabled technologies to discover, integrate, and analyze earth science data. It also promotes tool sharing through Web services. It provides a collaborative environment where scientists can share their resources for discovery and integration by registering them through well-defined ontologies (Sinha et al., 2006). DIA is developed using a variety of technologies including: ESRI's ArcGIS Server 9.1, Web services, .NET, Java, and JNBridge 3. <>Architecture of the DIA Engine
The DIA engine is a Web-accessible system that provides three classes of functionalities: discovery, integration, and analysis. Data discovery enables users to retrieve data sets, while data integration enables users to query multiple resources along some common attributes to generate previously unknown information called data products. Data analysis may be used to verify certain hypotheses or refine the data product.
Figure 1 DIa's Software architecture
To describe DIA's architecture (Fig.1), we will use the following query as our type example: (Q) Find A-type plutons in
User Interface: This is an ArcGIS Server .NET map viewer Web application. DIA provides a menu-based interface that enables users to specify a large number of complex queries. Map-based queries can be refined by specifying a bounding box that identifies a pair of latitude-longitude points which delimits the query's spatial scope. After the query's spatial scope is specified, the user uses DIA's drop-down menu to indicate the filters (A-type igneous rock filter in our running example) and/or tools to be applied to the data samples discovered in the query's spatial scope.
Web Servers: Typically, DIA uses two Web servers. The first Web server is responsible for routing users' queries to DIA's query processor and the second ensures communication between DIA's query processor and its own map server. In a minimal deployment, a single Web server may be used for both purposes. In the current DIA's implementation, we use a single instance of Windows IIS Web server as DIA's Web server.
Map Server: This component is an ArcGIS map server that provides maps to DIA's query processor.
Registry Servers: These servers could be distributed worldwide, and provide directory functionalities (registration of data and tools, indexing, search, etc.) The providers of resources advertise their resources on registry servers.
Query Processor (QP): This is the core component of the DIA engine. It is responsible for producing the results for users' queries and delivering them to the Web server. The QP consists, essentially, of two sub-components: (i) the query interpreter and (ii) the geology and mapping filters and tools. The former is a .NET module that interprets queries and identifies the appropriate filters and/or tools to be invoked to answer each query. The latter is a large set of .NET modules that perform DIA's core functionalities including filters (e.g., A-Type igneous rock filter), tools (e.g., kriging) and map management routines (e.g., coloring of geological bodies and sample points.)
Query processing consists of two phases: (i) data and tool discovery and (ii) filtering and integration: <>Data Discovery
During this operation, DIA engine identifies and retrieves the resources (data and tools) required to answer the user's query. To illustrate, consider the previous A-Type query. When the QP receives the query from the Web server, it determines the type of data required to answer the query. In this case, the QP determines that data associated with the keyword GeoChemistry is the query's target. The QP then interacts with one or several registry servers to retrieve the needed data. An example of registry server is available at www.geongrid.org. To interact with GEON server, DIA invokes a GEON Web service called GEONResources that provides functions for searching and getting the metadata information for resources registered through GEON portal. When invoking GEONResources, DIA's QP indicates that it is searching data sets registered with the keyword GeoChemistry and that contains data samples in the query's spatial bounding box. For each returned database, the DIA system executes a two-step process. First, it builds a virtual query (expressed in SOQL (a language developed by GEON's researchers at SDSC) that requests all the data (i.e., columns) that are necessary to apply the filter specified by the user. The DIA system then invokes a GEON Web service called SoqlToSql that translates this SOQL query into an SQL query. In the second step, DIA submits the SQL query to the GEON server that interacts with the actual database server, gets a record set containing the relevant data samples, and returns the data to the DIA engine. <>Filtering and integration
Data filtering is a process in which DIA engine transforms raw data into a data product. After DIA retrieves the data sets relevant to the user's query, it determines whether the filter(s) to apply or tool(s) to use is locally available. If so, the filter/tool is applied to the data sets and the query result is displayed to the user. If not, DIA searches for the needed filter/tool in registry servers. DIA is able to invoke any external tool that is wrapped as a Web service. In the case of the given A-type query, the A-Type filter is already available in DIA and also made available as a Web service for external users.
Integration in DIA is a process in which the results of several sub-queries are produced and then overlaid in the user interface. In the case of our A-Type query, DIA first follows the same workflow as for determining A-Type bodies to produce the result of kriging gravity data. DIA looks up registry servers for gravity data in the selected area of interest and then retrieves the raw gravity data from its provider(s) (e.g., http://paces.geo.utep.edu) DIA then determines whether a kriging tool is locally available. Since such a tool is already included in DIA's implementation, it is invoked and no external registry servers are searched. When the output of the kriging tool is generated, DIA overlays it on the previously generated results (i.e., A-Type plutons) making it possible for the user to have a natural and easily interpretable view of the integration's result (Fig. 2).
Figure 2 Semantically enabled integration of data products where a-type plutons and gravity fields have been merged through the dia engine
<>Conclusion
We suggest that the semantic integration of data and tools can be implemented through the DIA engine, a system that enables geoscientists to discover, integrate, and analyze earth science data. DIA also demonstrates the potential of the service-oriented design paradigm to enable scientists to share tools in addition to data. The DIA engine is now in its final pre-release phase. Its beta version is currently accessible at: http://mapserver.geos.vt.edu/DIA.
We expect that as the Semantic Web matures, more geoscientists will adopt ontologies as a means for data and service sharing and integration. The DIA engine is designed for Web-based geoinformatics systems. These systems would provide an infrastructure where scientists worldwide would be able to discover, integrate and analyze data. Acknowledgements: This research is supported by the National Science Foundation, award EAR 0225588 to A. K. Sinha. References Cited
A. K. Sinha, Z. Malik, A. Rezgui, and A. Dalton, Developing the Ontologic Framework and Tools for the Discovery and Integration of Earth Science Data, Annual Report, June 2006. http://geon.geol.vt.edu/pubreps/Virginia Tech Annual Report 2006.doc