GSA 2020 Connects Online

Paper No. 212-1
Presentation Time: 1:35 PM

NOVEL HYDROGEOLOGICAL PREDICTIONS AND INFERENCES USING MACHINE LEARNING ALGORITHMS: THREE ILLUSTRATIVE EXAMPLES


JAMEEL, Mohd Yusuf1, STAHL, Mason O.2, GEHRING, Jaclyn2 and VALLE, Denis3, (1)Civil and Environmental Engineering, Massachusetts Institute of Technology, 280 Vassar St, Apartment J2, Cambridge, MA 02139, (2)Department of Geology, Union College, Schenectady, NY 12308, (3)School of Forest Resources and Conservation, University of Florida, Gainesville, FL 32603

The amount and variety of environmental data created in recent years has increased dramatically due to improved data acquisition capabilities, deployment of large measurement networks, creation and easy access of large databases. The application of machine learning (ML) techniques to these large and heterogeneous datasets has allowed researchers to ask new questions and gain unique insight into hydrological processes that would have often been intractable without ML. In this talk, I’ll present three examples where we applied ML algorithms to large hydrogeology datasets for spatial predictions and inferences and highlight the advantages of ML in comparison to traditional approaches to analyze such datasets. Stable isotopes in groundwater provide important insights into ecohydrological processes, however their application is limited in regions with no data. We used the random forest algorithm to predict stable isotopes across the CONUS at a spatial resolution of 4 km which will allow applications of groundwater stable isotope in regions with sparse or missing data. In a similar application, we used support vector machine algorithm to predict probability of arsenic in groundwater exceeding 10 ppb in the Indian subcontinent, where high concentration of groundwater arsenic is a major public health issue. Our predictive maps help identifying high and low risk areas which could be useful in planning targeted testing campaigns and developing strategies to reduce arsenic exposure in high arsenic region. In both these analyses, ML predictions outperformed commonly used ‘kriging’ approach with lower uncertainty and ability to capture small scale variability. Lastly, we used stochastic block model to analyze a large database of pharmaceutical concentration in surface water around the world. The model suggested that country level variation in pharmaceutical detection rates was not due to poor or non-existent wastewater treatment facilities in developing countries but rather due to different pharmaceutical consumption habits, a surprising conclusion that would not have been identified using traditional clustering approach. Our work highlights the ability of ML to provide novel inferences and better prediction compared to conventional approaches.