GSA 2020 Connects Online

Paper No. 212-6
Presentation Time: 3:10 PM

COUPLED PHYSICAL SIMULATION (PS) AND MACHINE LEARNING (ML) FOR MAPPING GROUNDWATER AGE


STARN, J. Jeffrey, US Geological Survey, 101 Pitkin Street, East Hartford, CT 06108

We used a novel PS/ML approach to map groundwater residence time distributions (RTD) throughout the Glacial Principal Aquifer in the U.S. (GLAC). A volume of groundwater is characterized by its RTD rather than a single time because it contains water having different flow paths and velocities. RTDs characterize the lag times (1) between contamination (recharge) and exposure (discharge) and (2) for biogeochemical processes to occur. RTDs cannot be measured directly but rather must be inferred from tracer data using models. One approach is with three-dimensional PS models, but these can be very resource intensive. An additional challenge in the GLAC is that the aquifer materials are spatially discontinuous and have highly variable properties. PS models that cover smaller areas are less resource intensive and more tractable. ML methods can detect patterns between RTDs derived from small-area PS models and large-area geospatial data. In this way, ML methods “learn” how RTDs are related to the physical processes of groundwater flow.

RTDs were calculated on 115 computationally inexpensive small-area PS models distributed throughout the GLAC. Four RTD metrics (fraction < 65 years old; mean age of young fraction; median age of old fraction; and mean path length) were calculated at 130,740 random PS model cells using particle tracking. A subset consisting of 80% of the data was used to train the ML model (eXtreme Gradient Boosting; XGBoost) on RTD metrics. Explanatory features consisted of large-area geospatial datasets available throughout the GLAC. Predictions were made on the remaining 20% of the data and had Nash-Sutcliffe Efficiency (NSE) between PS and ML models of 0.82, 0.82, 0.46, and 0.79, respectively. In addition to the expected importance of aquifer thickness and recharge rate, Multi-Order Hydrologic Position and hydrogeologic terrane were important features that by themselves produced ML models with NSE close to the full model. RTD metrics were mapped throughout the GLAC with the trained ML model. Predictions showed that the volume of young groundwater stored in the GLAC is about 6,000 km3, or about 0.5% of globally stored young groundwater. Most of the groundwater is less than 65 years old and usually travels distances less than one kilometer but, in some cases, up to 30 kilometers before discharging to the surface.