GSA 2020 Connects Online

Paper No. 212-7
Presentation Time: 3:25 PM

USING MACHINE LEARNING TO PREDICT HIGH ARSENIC OR HIGH MANGANESE AT DRINKING WATER DEPTHS OF THE GLACIAL AQUIFER SYSTEM, NORTHERN CONTINENTAL US


ERICKSON, Melinda L.1, ELLIOTT, Sarah M.1, STACKELBERG, Paul E.2, BROWN, Craig J.3, RANSOM, Katherine M4 and REDDY, James E.5, (1)U.S. Geological Survey, Upper Midwest Water Science Center, 2280 Woodale Drive, Mounds View, MN 55112, (2)U.S. Geological Survey, 425 Jordan Road, Troy, NY 12180, (3)New England Water Science Center, U.S. Geological Survey, 101 Pitkin Street, East Hartford, CT 06108, (4)California Water Science Center, U.S. Geological Survey, 6000 J St., Sacramento, CA 95819, (5)U.S. Geological Survey, 30 Brown Road, Ithaca, NY 14850-1573

Globally, over 200 million people are chronically exposed to arsenic (As) and/or manganese (Mn) from drinking water. Inorganic As exposure is linked to increased risk of cancer and non-cancer adverse health outcomes. Manganese is an essential element, but exposure of infants and children to higher concentrations of Mn through drinking water is associated with adverse survival, neurological, and intellectual outcomes. Here we present predicted probabilities of high As (>10 µg/L) or high Mn (>300 µg/L) in the glacial aquifer system (GLAC, 1.87 million km2). The GLAC spans the northern U.S. coast-to-coast and provides drinking water to 30 million people, with an estimated 4M relying on groundwater with high Mn and/or As.

We use boosted regression tree (BRT) models, a type of Machine Learning (ML), to predict the occurrence of high As and Mn across the GLAC. We include new 3-dimensional model predictions of redox condition and pH, known mechanistic drivers of As and Mn mobilization, as predictor variables. We also demonstrate a successful approach for significantly improving ML predictions for imbalanced datasets; in our case only about 10% of samples have high-As (≈ 30% high Mn). We predicted the occurrence of high-As and high-Mn groundwater conditions and calculated uncertainty at two moving median depth surfaces that represent domestic and public supply well depths. Typical drinking water supply depths vary greatly across the GLAC.

The Mn model had 55 predictor variables and sensitivity of 70%. The As model had 79 predictor variables and sensitivity of 51%. Several of the most influential variables are common between the Mn and As models, including predicted likelihood of anoxic condition and predicted pH, although their prediction responses differ illustrating discernable mechanistic differences in model results. Predicted high-Mn is more common at domestic well depths; predicted high-As is more common at public water supply depth, which is consistent with the known differences in Mn and As mobilization mechanisms. High Mn and high As are commonly predicted in the central part of the GLAC, but high As is predicted over a smaller proportion of the area. Results can be used to direct water quality characterization efforts in areas in the GLAC with little data but that are predicted to have high As or Mn.