USING WDO-IT TO BUILD A GEOSCIENCE ONTOLOGY
Introduction Workflow-Driven Ontologies (WDOs) is an approach to ontology development that is based on scientist-level terminology such as "dataset" and "methods" that claims to facilitate the scientist process of encoding knowledge from their domains (Salayandia, December 2006). In addition, resulting ontologies produced from using the WDO approach may include properties that enable the automatic generation of suggested workflow specifications. These suggested workflow specifications, once refined and endorsed by scientists, can be used as a training tool, since they provides a graphical representation that the scientist can easily relate to. The endorsed workflow specifications can be refined into fully computable workflows that facilitate the discovery and integration of resources available over cyberinfrastructures.
WDO-it is a prototype tool developed in Java that supports the WDO approach. Ontologies created with the WDO-it tool are encoded in the Ontology Web Language (OWL). OWL is the standard ontology language proposed by the W3C and the main framework language used by the semantic web community (OWL, 2004).
In this abstract, we discuss the creation of an ontology about gravity data processing as defined and used in the domain of geophysics. The ontology, called GravityWDO, has been created with the current version of the WDO-it tool available at http://trust.utep.edu/ciminer/software/wdoit/. The current state of WDO-it allows suggested abstract workflow specifications to be generated from the knowledge provided by scientists. We call these abstract workflow specifications Model-Based Workflows (MBWs), since they are instantiations on an abstracted workflow model. A graphical representation of the MBWs can be visualized and can serve as a training tool by itself. Additional work is underway that will allow the MBWs to be migrated to an executable workflow language, such as the Modeling Markup Language (MoML), the language that the Kepler Scientific Workflow engine uses to represent workflows (Ludäscher, 2005). The claim is that scientists can relate easier to MBWs than to real executable workflows because they describe only essential properties that are required at the scientist-level. Nevertheless, WDO-it will provide mechanisms to use MBWs as the basis to create executable workflow specifications.
Building Ontologies Using WDO-it WDO-it provides three basic modes for building ontologies: (i) brainstorming mode; (ii) harvesting mode; and (iii) relation elicitation mode. In the brainstorming mode, scientists have the opportunity to enter concepts from his/her domain of interest, where these concepts are classified as either Information concepts or Method concepts. WDO-it does not use the term "concept" in its user interface. Instead, it provides a very simple interface where scientists can see that Information concepts range from Raw Data, e.g., concepts that represent data measured in the field, to Products, e.g., concepts that represent models or maps of interest to the scientist. Moreover, the interface provides a way for entering Method concepts that represent the algorithms, applications, and tools that are used to retrieve or transform information concepts. For example, an application that retrieves gravity data from a database about a specified region of interest can be classified as an information retrieval concept. A tool that employs the nearest-neighbor algorithm to create a grid of uniformly distributed data points from a collection of scattered points can be classified as an information transformation concept. These concepts, however, can be already specified in some existing ontology that a scientist may want to reuse. In this case, concepts in the existing ontology can be imported and later classified into information and method accordingly. This is referred to as the harvesting mode.
Any time after at least a method is created in the brainstorm or harvesting mode, the scientist can switch to the relationship elicitation mode to identify relationships between the information concepts and the method concepts. These relationships are of the type IsInputTo and IsOutputFrom, where a scientist indicates which information concepts are required as input to a method concept, and which information concepts are the output of a given method concept. Figure 1 shows a snapshot of the ontology relationship tab. Notice that the tool does not show all the relationships available in the ontology being created. Instead, the user selects a method concept, and the input information concepts, as well as the output information concepts are shown for the selected method concept only. By focusing on one method at a time, the scientist can have better control of the relationships between concepts, instead of seeing a cluttered diagram that shows all relationships between all concepts, typical of other general-purpose ontology editor tools. Figure 1 shows the creation of a Gravity Ontology, where the scientist selects a method called Gridding, and for which the information concept CompleteBouguerAnomaly is shown as its only input, and the concept Grid is shown as its output.
Additionally, the scientist can create new types of properties that can be used to customize relationships between concepts. For example, a scientist may create a HAS property that can be used in a relationship to indicate that a SeismicEvent HAS a Location, and a Time concept related to it. This functionality is available through the Advanced View button shown in Figure 1. Moreover, since WDOs are OWL ontologies, more generic ontology editors like Protégé (Gennari, 2002) and SWOOP (SWOOP, 2006) can be used.
Generating workflow specifications through WDO-it Once the scientist has built a WDO about their domain, WDO-it can use this WDO to automatically generate a suggested workflow specification for a given information concept of interest. For example, if a scientist is interested in obtaining a workflow that would describe the necessary steps to create a Grid of gravity data, the scientist would choose the concept of interest from the information concepts available in the captured knowledge about gravity data, i.e., a gravity ontology. The WDO-it tool creates a suggested workflow specification, i.e., an MBW, based on the relationships available in the captured knowledge (Salayandia, October 2006). The scientist is presented with a graphical representation of the workflow, and the workflow specification can be saved as an OWL file that is separate from the WDO. The MBW is not formally considered a workflow specification until it is endorsed by a scientist an accurate representation of a process in the scientist domain. Corrections and refinements may be needed for the scientist to endorse the MBW. The WDO-it evaluation mode is responsible for enabling the scientist to critique, refine and endorse suggested MBWs.
Figure 2 shows a snapshot of the workflow generator tab of the WDO-it tool and the resulting diagram generated for the Grid information concept, according to the knowledge captured in the loaded gravity ontology. Notice that there are some methods that have multiple inputs. Multiple inputs going into a method go through an AND method, indicating that all inputs are necessary for the given method to produce a given output. XOR operators are also used for the case where there can be different inputs to a method, but only one of them is needed for the method to create an output.
References Cited Gennari, J., Musen, M.A., Fergerson, R.W., Grosso, W.E., Crubezy, M., Eriksson, H., Noy, N.F., Tu, S.W. The Evolution of Protégé: An Environment for Knowledge-Based Systems Development. 2002.
Ludäscher, B., Altintas, I., Berkley, C., Higgins, D., Jaeger, E., Jones, M., Lee, E.A., Tao, J. and Zhao, Y. (2005) Scientific workflow management and the Kepler system', Concurrency and Computation: Practice and Experience, Special Issue on Workflow in Grid Systems, John Wiley & Sons Ltd., Vol. 18, No. 10, pp.10391065.
OWL Web Ontology Language Overview, http://www.w3.org/TR/owl-features/, February 2004.
Salayandia, L., Pinheiro da Silva, P., Gates, A.Q., and Salcedo, F. Workflow-Driven Ontologies: An Earth Sciences Case Study, In Proceedings of the 2nd Intl Conference on e-Science and Grid Computing, Amsterdam, Netherlands, December 2006.
Salayandia, L., Pinheiro da Silva, P., Gates, A.Q., and Rebellon A. A Model-Based Workflow Approach for Scientific Applications, In Proceedings of the 6th OOPSLA Workshop on Domain-Specific Modeling, Portland, Oregon, October 2006.
SWOOP Hypermedia-based OWL Ontology Browser and Editor http://www.mindswap.org/2004/SWOOP/, 2006.
Figure 1. WDO-it tool, ontology relationship tab.
Figure 2. WDO-it tool, workflow generator tab.