CALIBRATING MICROFOSSIL INDICATORS: HOW MANY SAMPLES ARE ENOUGH?
Two large, diatom-based training sets from the Great Lakes were investigated to determine optimal sample sizes for inference models: (1) periphyton from coastlines, (2) pelagic phytoplankton. Weighted average models to infer phosphorus concentrations from diatom assemblage data were developed. Training set sample sizes ranging from 10 to the maximum were created through random selection, and performance of each model was evaluated. For each iteration, diatom-inferred nutrient data were related to stressor data (e.g., adjacent agricultural activity) to characterize model ability to track human activities. At least 40-80 samples were needed to capture environmental conditions to such a degree that non-analogue situations should be rare, and so should provide an unambiguous result if the diatom model was applied to any sample assemblage. One should exercise caution when dealing with smaller training sets unless there is certainty that the selected samples reflect the regional variability in species assemblages and environmental conditions. We encourage training set users to employ a similar evaluation to determine whether they have effectively sampled their region of interest. We also encourage the use of a similar optimizing procedure for any microfossil indicator that uses taxonomic information in paleoecological reconstructions.