GEOLOGIC RANKING AS AN EXPLANATORY VARIABLE IN ESTIMATING THE PROBABILITY OF ELEVATED ARSENIC CONCENTRATIONS IN PENNSYLVANIA GROUNDWATER
Predicted probabilities for arsenic concentrations are calculated and mapped using a logistic regression model. The model uses a binary dependent variable of 5,023 water well arsenic concentrations and explanatory data extracted for each well using a Geographic Information System (GIS). The spatial dataset used as the explanatory variable is made up of ranked geologic units with arsenic presence rankings assigned according to professional opinion obtained through arsenic studies, arsenic concentrations in stream sediments, and geologic characteristics.
The logistic regression model identifies ranked geologic units as an influential variable in predicting the probability of elevated arsenic concentrations in areas of sparse data. The model results include a Standardized Estimate of 0.34 for geologic ranking, a max-rescaled R-Square of 0.09, and satisfactory model fit for the Hosmer and Lemeshow Goodness-of-Fit Test. Predicted probabilities of elevated arsenic concentrations range from 1 to 48 percent, and the resulting map shows a predicted probability of 35 to 48 percent in eastern and north central Pennsylvania. Pearson residuals were calculated and plotted, indicating clusters of poor predictions in southeast, southwest, and northwest Pennsylvania. Although this model provides a reasonable geologic portrayal of elevated arsenic concentrations, the predictive power of the model could be improved by including additional explanatory variables, such as geochemical parameters, land cover characteristics, or soils properties.