Geoinformatics 2007 Conference (17–18 May 2007)

Paper No. 22
Presentation Time: 2:30 PM-4:30 PM

SCIENTIFIC WORKFLOWS FOR ONTOLOGY-BASED DATA MINING OF GEOLOGICAL DATA


ALTINTAS, Ilkay1, GRAVES, Sara2, RAMACHANDRAN, Rahul2, SEBER, Dogan3 and SINHA, Krishna4, (1)San Diego Supercomputer Center, UCSD, La Jolla, CA 92093, (2)Huntsville, AL 35899, (3)San Diego Supercomputer Center, Univ of California, San Diego, 9500 Gilman Drive, Mail Code 0505, La Jolla, CA 92093-0505, (4)GSA Geoinformatics Division, 4044 Derring Hall, Dept of Geol, Virginia Tech, Blacksburg, VA 24061, altintas@sdsc.edu

As scientific research becomes more interdisciplinary and requires integrative approaches, domain scientists are in need of advance technology tools to be able to analyze and integrate diversified and voluminous data sets for their research activities. Although there are a variety of technology based resources such as data mining tools, scientific workflow management systems, and portal frameworks, building a complete cyberinfrastructure framework that incorporates all these technologies is challenging and requires an extensive collaboration among domain scientists and technology developers. In this abstract, we explain our approach to develop strategic cyberinfrastructure technologies that will integrate workflow technologies with data mining resources and portal frameworks in a semantically enabled work environment. This is a unique effort in the sense that each component will have to be well integrated into the system while giving sufficient flexibility for the developed services to be applicable in other scientific disciplines. Services to be developed in these efforts will enable scientists to manage large and heterogeneous data sets in a timely fashion and extract new knowledge from existing and future data sets. We chose geosciences as our demonstration domain as geoscience data are extremely heterogeneous and complex and there is significant expertise and resources available to be used in such activities.

Each cyberinfrastructure component identified in this effort is at a sophistication level that they can now be used in an integrative environment. Workflows significantly improve data analysis, especially when data are obtained from multiple sources and/or various analysis tools. With their nature workflows are effective in integrating different technologies and formalize the process hence they form a natural integration environment. Despite many existing efforts in workflow development, integrating workflows with ontology-based data mining provide unique challenges for developers and require collaborative research efforts among technology developers. Application of traditional data mining techniques (clustering, classification) has been around for a long time and has resulted in extracting novel information from large scientific data bases (e.g. atmospheric sciences, genetics), and helped to manage costs and design of effective sampling/experimental strategies. As scientific data become more complex however, as in the geosciences, it is important to relate data to the concepts within disciplines. Data mining services, including developing new ones that deal with sparse data, can be applied at different levels of abstraction and help the user discover more meaningful patterns leading to a more robust capability to answer scientific questions. For example, a geoscientist may want to study predicting volcanic eruptions using these types of resources or apply the technologies to identify plate tectonic setting of a former volcano. As future steps, we plan to apply the ontology driven data mining approach to global geoscience datasets such as GeoRoc, EarthChem, Petros, Pluto and volcano databases from the U.S.Geological Survey and NASA towards discovering patterns and trends between plate tectonic settings and volcanism not recognized by individual scientists. We will utilize workflows in a portal environment to integrate semantic data management and data mining technologies seamlessly to facilitate a more comprehensive understanding of the nature of volcanism and its plate tectonic settings.