Paper No. 15-1
Presentation Time: 1:35 PM
A MACHINE LEARNING MODEL TO ESTIMATE THE OCCURRENCE OF LITHIUM IN GROUNDWATER USED AS DRINKING WATER FOR THE CONTIGUOUS UNITED STATES
Lithium (Li) concentrations in drinking water supplies are not currently regulated in the United States, however lithium is included in the U.S. Environmental Protection Agency list of unregulated contaminants to be monitored by public water systems. Li is used as a pharmaceutical to treat bipolar disorder and previous studies have linked its relatively low-level occurrence in drinking water to beneficial human health outcomes including reduced suicide rates. However, too much Li is detrimental to human health as sometimes seen when used as a medication. A machine learning model was developed to estimate the geogenic occurrence of Li in drinking water supply wells throughout the contiguous United States. The model was trained using Li measurements from ~16,000 wells and independent predictor variables related to the sources and drivers of its occurrence in groundwater. The model predicts the probability of Li occurring in four concentration categories, ≤4 µg/L, >4 and ≤10 µg/L, >10 and ≤30 µg/L, and >30 µg/L. These concentration thresholds were chosen to investigate health impacts of low-level exposure. Extreme gradient boosting methods were used to develop the model and provide the ability to interpret relationships between the independent predictor variables and model predictions. Model predictions were evaluated using wells held out from model training and have an accuracy of ~65%. Important predictor variables include average annual precipitation, well depth, and soil geochemistry. The spatial resolution of the model is 1km2 and represents well depths associated with public-supply and private-supply wells, resulting in a model prediction map for each type of groundwater supply well. The predictive map for private-supply wells was compared to newly available Li measurements from private wells in Nevada and indicates the model predictions in this area have a lower accuracy (47%) and tend to underpredict Li concentrations. This suggests regional differences in model accuracy that may be improved by testing regional variables in future updates to the model. This model was developed as a collaboration between hydrologists and epidemiologists with the intent of using the model prediction maps as a tool to quantify Li exposure from drinking water and compare to human health data at the national scale.