2004 Denver Annual Meeting (November 7–10, 2004)

Paper No. 3
Presentation Time: 1:30 PM-5:30 PM

ONTOLOGY DRIVEN DATA MINING FOR GEOSCIENCES


TADEPALLI, Satish1, SINHA, A.K.2 and RAMAKRISHNAN, Naren1, (1)Department of Computer Science, Virginia Tech, 660 McBryde Hall, Blacksburg, VA 24061, (2)Department of Geosciences, Virginia Tech, Blacksburg, VA 24061, stadepal@vt.edu

The application of data mining algorithms is a critical part of the knowledge discovery process. Application of traditional data mining techniques (clustering, classification) has resulted in extracting novel information from large scientific data bases (e.g. atmospheric sciences, genetics), and helped to manage costs and design of effective sampling/experimental strategies. As scientific data becomes more complex, as in solid earth science, it is important to relate data to the concepts within disciplines. Domain knowledge is represented in the form of an ontology, which describes the concepts or terms in a domain and the hierarchical relationships that exist between them. The data sets of the domain can then be structured by associating them to the concepts of the ontology. The user can easily retrieve the relevant data sets to be compared by navigating the ontology. Thus, data mining algorithms can be applied at different levels of the abstraction and help the user discover more meaningful patterns. We applied this ontology driven data mining approach to the GeoRoc (http://georoc.mpch-mainz.gwdg.de/) data base. The data set corresponding to the class of convergent margins was ontologically structured into different subclasses. Convergent margin settings include geometrical relationships between upper plate and the subducted (lower) plate. In addition, the composition of both plates (continental, oceanic) leads to well recognized geochemical affinities. Similarities and differences between these environments are further constrained by rate of subduction, angle of subduction as well as the age of the plates involved in convergent margin settings. An ontology was developed taking these properties of convergent margins into consideration, and the data set was structured accordingly. Correlation analysis of this dataset showed a high negative correlation between Si and Fe in continental convergent margins ( correlation coefficient >0.9 for Cascades and Andean arc) which is in sharp contrast to the moderate negative correlation in oceanic continental margins (<0.6 for Tonga and Mariana). This example demonstrates the role ontologies can play in data mining algorithms to explore the underlying processes responsible for similarities and differences between various geologic environments.