APPLYING COST-SENSITIVE MODELS TO PREDICTING GEOLOGIC PROCESSES: A COMPARISON OF MACHINE LEARNING TECHNIQUES AND LINEAR STATISTICS
A modeling method was developed to predict the dependent variable of a rapid assessment protocol (RAP) used to predict channel stability in an effort to improve accurate prediction, weighted for cost effectiveness and safety. The purpose of the research was to determine: 1) relationships between channel stability and major land-use and biogeomorphic features, and 2) if a predictive model could be developed to aid in identifying unstable areas while minimizing costs, for the purpose of land management.
This research used Pearsons and chi-squared correlations to determine associative relationships between channel stability and major land-use and biogeomorphic features. The results of the Pearsons correlations were used to build and test classification models using randomly selected training and test sets. The modeling techniques assessed were regression, single decision trees, and bagged (bootstrap aggregated) decision trees. A cost analysis / prediction (CAP) model was developed to incorporate cost-effectiveness and safety into the models. The models were compared based on their 1) performance and 2) operational advantages and disadvantages. The research determined that utilizing a method that integrated a CAP model, receiving operator characteristic (ROC) curves, and bagged decision trees produced a reliable predictive model. This system can be used in conjunction with a GIS to produce maps to guide field investigations.