Northeastern Section - 59th Annual Meeting - 2024

Paper No. 24-26
Presentation Time: 9:00 AM-1:00 PM

A SPATIAL MACHINE LEARNING MODEL DEVELOPED FROM NOISY DATA REQUIRES MULTISCALE PERFORMANCE EVALUATION: PREDICTING DEPTH TO BEDROCK IN THE DELAWARE RIVER BASIN, USA


GOODLING, Phillip, U.S. Geological Survey, Earth System Processes Division, 5522 Research Park Drive, Baltimore, MD 21228, BELITZ, Kenneth, U.S. Geological Survey, Earth System Processes Division, Carlisle, MA 01741-1460, STACKELBERG, Paul, US Geological Survey, Earth System Processes Division, 425 Jordan Road, Troy, NY 12180 and FLEMING, Brandon, US Geological Survey, Pennsylvania Water Science Center, 215 Limekiln Road, New Cumberland, PA 17070

Data-driven machine learning models are now widely applied in the environmental sciences to predict conditions in unmonitored areas. These models are often developed from data of varying quality. Unexplainable variability, sometimes called ‘noise’, can arise from several sources including reporting uncertainty, definition uncertainty, or heterogeneity occurring at small spatial scales. This noise is substantial for some environmental datasets. Traditional point-scale metrics (e.g., R2) alone can be misleading when evaluating the performance of models developed from noisy data. In this presentation we present a multi-scale performance evaluation (MPE) approach to assess a machine learning model trained to predict depth to bedrock (DTB) in the Delaware River basin. The model was trained on a large dataset of driller’s logs that is highly variable at the localized point scale. The MPE framework uses two scales (distributional and geostatistical) in addition to the point scale. We use the MPE framework to evaluate our DTB model, evaluate the effectiveness of a bias correction post-processing step, and to compare our model against a global model of DTB. Geostatistical analysis shows that approximately one third of the DTB variance is at a smaller spatial scale than can be modelled. Hence, one cannot achieve a point-scale R2 of 1. When judged within this context we find that our point-scale R2 of 0.3 (testing data), while seemingly poor, is sufficient. Bias correction applied to the Delaware River basin model results in improvements in MPE performance: there is negligible change in point-scale R2, an improved match between the cumulative distribution functions of the observed and modelled data, and an improved match between variograms of the observed and modelled data, without introducing spatially autocorrelated residuals. In contrast, bias correction applied to a global model of DTB does not improve MPE performance. We conclude by encouraging environmental modelers to quantify the noise in their data sets and to think critically about model performance metrics and data handling techniques that enable effective model intercomparison.