SCIENTIFIC WORKFLOWS FOR ONTOLOGY-BASED DATA MINING OF GEOLOGICAL DATA
Each cyberinfrastructure component identified in this effort is at a sophistication level that they can now be used in an integrative environment. Workflows significantly improve data analysis, especially when data are obtained from multiple sources and/or various analysis tools. With their nature workflows are effective in integrating different technologies and formalize the process hence they form a natural integration environment. Despite many existing efforts in workflow development, integrating workflows with ontology-based data mining provide unique challenges for developers and require collaborative research efforts among technology developers. Application of traditional data mining techniques (clustering, classification) has been around for a long time and has resulted in extracting novel information from large scientific data bases (e.g. atmospheric sciences, genetics), and helped to manage costs and design of effective sampling/experimental strategies. As scientific data become more complex however, as in the geosciences, it is important to relate data to the concepts within disciplines. Data mining services, including developing new ones that deal with sparse data, can be applied at different levels of abstraction and help the user discover more meaningful patterns leading to a more robust capability to answer scientific questions. For example, a geoscientist may want to study predicting volcanic eruptions using these types of resources or apply the technologies to identify plate tectonic setting of a former volcano. As future steps, we plan to apply the ontology driven data mining approach to global geoscience datasets such as GeoRoc, EarthChem, Petros, Pluto and volcano databases from the U.S.Geological Survey and NASA towards discovering patterns and trends between plate tectonic settings and volcanism not recognized by individual scientists. We will utilize workflows in a portal environment to integrate semantic data management and data mining technologies seamlessly to facilitate a more comprehensive understanding of the nature of volcanism and its plate tectonic settings.