Northeastern Section - 59th Annual Meeting - 2024

Paper No. 37-9
Presentation Time: 11:00 AM

ASSESSING EXPECTED ANALYTE DISTRIBUTIONS USING LOGRATIO TRANSFORMATION, MAHALANOBIS DISTANCE EVALUATION AND SOIL PHYSICAL PROPERTIES: AN EXAMPLE FROM A FARM SOIL DATASET


RIDENOUR, James, New York State Department of Health, Center of Environmental Health, Bureau of Toxic Substance Assessment, Corning Tower, Empire State Plaza, Room 1743, Albany, NY 12237

Standard approaches for defining outliers as used with conventional datasets are potentially problematic for concentration data, because these methods remain rooted in geometries that are not descriptive of concentration space. Concentrations are one example of “compositions”, which convey information on the relative abundance (proportions) of the constituent parts. They (and other kinds of compositional data) occupy a constrained sample space with different properties than the Euclidean space for which conventional statistical methods are derived. This also holds implications for conventionally derived measures of “typical” concentrations or ranges for individual analytes, as atypical cases can go unrecognized when constituent parts comprising compositional datasets are evaluated as independent variables. Although some remain unaware of these issues, tools specifically designed for working with compositional data have been available since the 1980s. The application of compositional methods to appropriately distinguish typical (“core”) from nonrepresentative multipart cases (after which familiar methods are used to describe the single part distributions for these groups) is demonstrated with an Exploratory Data Analysis of a New York State farm soil dataset. The analysis also recognizes the soil textural classes of the samples as a relevant factor impacting the initial analyte distributions. Accounting for soil texture differences and applying relevant approaches developed for investigating compositional data are both important considerations in applications seeking to define background soil levels and other prevalence-based descriptive comparison values.