2003 Seattle Annual Meeting (November 2–5, 2003)

Paper No. 13
Presentation Time: 11:15 AM

APPLYING COST-SENSITIVE MODELS TO PREDICTING GEOLOGIC PROCESSES: A COMPARISON OF MACHINE LEARNING TECHNIQUES AND LINEAR STATISTICS


MORET, Stephanie L.1, MARGINEANTU, Dragos2 and LANGFORD, Bill T.2, (1)Civil Engineering, Oregon State University, Corvallis, OR 97331, (2)Dept of Computer Science, Oregon State Univ, Corvallis, OR 97331, stephanie.moret@orst.edu

Natural resource data is typically non-linear and complex, yet modeling methods often utilize statistical analysis techniques, such as regression, that are insufficient for use on this type of data. This research utilized an innovative modeling method based on pattern recognition techniques borrowed from the field of machine learning. These techniques make no data distribution assumptions, can fit non-linear data, can be effective on a small data set, and can be weighted to include relative costs of different predictive errors.

A modeling method was developed to predict the dependent variable of a rapid assessment protocol (RAP) used to predict channel stability in an effort to improve accurate prediction, weighted for cost effectiveness and safety. The purpose of the research was to determine: 1) relationships between channel stability and major land-use and biogeomorphic features, and 2) if a predictive model could be developed to aid in identifying unstable areas while minimizing costs, for the purpose of land management.

This research used Pearson’s and chi-squared correlations to determine associative relationships between channel stability and major land-use and biogeomorphic features. The results of the Pearson’s correlations were used to build and test classification models using randomly selected training and test sets. The modeling techniques assessed were regression, single decision trees, and bagged (bootstrap aggregated) decision trees. A cost analysis / prediction (CAP) model was developed to incorporate cost-effectiveness and safety into the models. The models were compared based on their 1) performance and 2) operational advantages and disadvantages. The research determined that utilizing a method that integrated a CAP model, receiving operator characteristic (ROC) curves, and bagged decision trees produced a reliable predictive model. This system can be used in conjunction with a GIS to produce maps to guide field investigations.