Southeastern Section - 70th Annual Meeting - 2021

Paper No. 10-12
Presentation Time: 4:45 PM

WHO'S AFRAID OF BAD BIG DATA? CONFUSION AND COVERAGE IN THE COASTAL PLAIN OF THE CAROLINAS


CAMPBELL, David, Paleontological Research Institution, 1259 Trumansburg Rd, Ithaca, NY 14850 and CAMPBELL, Timothy, T D Campbell Homeschool, c/o Department of Natural Sciences, 110 S Main St #7270, Boiling Springs, NC 28017

Analyses of paleontological data increasingly rely on large aggregate datasets accessed online. For example, molecular clock calculations routinely rely on such sources for calibrations. However, these aggregate databases generally have limited capacity for correction, and significant gaps exist in the coverage. Outdated taxonomy and stratigraphy, confusion of similar or homonymous names, and mistakes can produce significantly incorrect data such as European Jurassic species reported in the Pleistocene of the Carolinas. Input data may derive from older references and collections, which are valuable data sources but require updating for proper integration into a database. In our research on Cenozoic mollusks of the southeastern U.S., we have found several examples of incorrect or missing data. Because these incorrect citations are likely to be outliers, they can significantly distort summaries of stratigraphic and geographic distribution. In turn, uncritical use of such summaries can lead to highly inaccurate conclusions. Building reliable datasets will require support for the taxonomic and stratigraphic expertise necessary to ensure that the underlying information is credible.