TOWARD BIAS CORRECTION IN THE MEASUREMENT OF OCCUPANCY IN ECOLOGY AND PALEONTOLOGY
I develop a maximum-likelihood method to estimate the distribution of occupancy probabilities of all taxa based only on the sample of observed taxa with non-zero occupancy. The method is based on determining the probability that the number of occupied sites will take on any specific value for a given occupancy probability, integrated over the distribution of probabilities. The target is the underlying distribution of probabilities, not the distribution of the number of sites occupied. I give examples using data on marine animal genera from the Paleobiology Database; the sampling units are equal-area cells and the data are aggregated at the stage level. For these data, a log-normal distribution of occupancy probabilities fits well. If we focus on genera sampled in the stages immediately before and immediately after a given stage, we know which ones existed but weren't sampled; we therefore can test the method. The number of unsampled taxa and the mean occupancy of all taxa, sampled and unsampled, are predicted well, even though only sampled taxa are taken into account in the model fitting.
Substantial reinterpretations are sometimes required by the bias correction. For example, the "rise-and-fall" pattern of occupancy within the history of individual genera is much more pronounced with the bias correction than in the raw data. And an Induan peak in occupancy may be partly an artifact of the small number of sites in that stage, with the true peak falling in the preceding Changhsingian stage.