USING MACHINE LEARNING TO PREDICT HIGH ARSENIC OR HIGH MANGANESE AT DRINKING WATER DEPTHS OF THE GLACIAL AQUIFER SYSTEM, NORTHERN CONTINENTAL US
We use boosted regression tree (BRT) models, a type of Machine Learning (ML), to predict the occurrence of high As and Mn across the GLAC. We include new 3-dimensional model predictions of redox condition and pH, known mechanistic drivers of As and Mn mobilization, as predictor variables. We also demonstrate a successful approach for significantly improving ML predictions for imbalanced datasets; in our case only about 10% of samples have high-As (≈ 30% high Mn). We predicted the occurrence of high-As and high-Mn groundwater conditions and calculated uncertainty at two moving median depth surfaces that represent domestic and public supply well depths. Typical drinking water supply depths vary greatly across the GLAC.
The Mn model had 55 predictor variables and sensitivity of 70%. The As model had 79 predictor variables and sensitivity of 51%. Several of the most influential variables are common between the Mn and As models, including predicted likelihood of anoxic condition and predicted pH, although their prediction responses differ illustrating discernable mechanistic differences in model results. Predicted high-Mn is more common at domestic well depths; predicted high-As is more common at public water supply depth, which is consistent with the known differences in Mn and As mobilization mechanisms. High Mn and high As are commonly predicted in the central part of the GLAC, but high As is predicted over a smaller proportion of the area. Results can be used to direct water quality characterization efforts in areas in the GLAC with little data but that are predicted to have high As or Mn.