USE OF META-MODELING TO ESTIMATE REGIONAL GROUND WATER RESIDENCE TIME DISTRIBUTIONS
Targets for machine learning were created from simulated RTDs by fitting one- and two- component 3-parameter (shape, location, and scale) distributions (Weibull, gamma, and inverse Gaussian). Basin-wide RTDs were best fit by the one-component Weibull distribution. RTDs at stressed receptors, such as wells, often produced more complicated RTDs that required a two-component mixture to fit. Machine learning was done with a form of penalized linear regression called Multitask LASSO (Least Absolute Shrinkage and Selection Operator). “Multitask” refers to a method that estimates multiple outcomes simultaneously, in this case the three Weibull parameters, which ensured that the same features (but different coefficients) were used to estimate each parameter. LASSO was trained on parameters of the best fit Weibull distribution using hydrogeographic variables of the modeled domains as explanatory features. LASSO features are standardized, enabling comparison of coefficient magnitudes to determine the relative importance of the features. The shape, location, and scale parameters of the parametric RTDs were strongly related to the mean exponential age. The shape parameter of the distribution, which controls deviation from an exponential distribution, was also a function of aquifer heterogeneity and hydrologic features. The results show that aquifer heterogeneity and exchange of water between glacial deposits and bedrock and surface water are important for estimating basin-wide RTDs. The quantitative understanding gained from the meta-model enables RTDs to be estimated across the glaciated U.S.