GSA Connects 2023 Meeting in Pittsburgh, Pennsylvania

Paper No. 50-7
Presentation Time: 3:20 PM

MOBILIZING AND SHARING DIGITIZED FOSSIL AND EXTANT SPECIMEN IMAGES: INTEGRATING THE TREATISE ON INVERTEBRATE PALEONTOLOGY AND MORPHOSOURCE


LÓPEZ CARRANZA, Natalia, Biodiversity Institute, University of Kansas, 1345 Jayhawk Blvd., Lawrence, KS 66045 and LIEBERMAN, Bruce, Paleontological Institute, Biodiversity Institute and Department of Ecology & Evolutionary Biology, University of Kansas, 1345 Jayhawk Blvd., Lawrence, KS 66045

With the increasing need for accessible and reusable data in paleontological research, the Paleontological Institute aims to mobilize the invaluable data contained in the Treatise on Invertebrate Paleontology to make it FAIR—findable, accessible, interoperable, and reusable. The Treatise is one of the most comprehensive and authoritative resources for invertebrate fossil groups describing morphology, taxonomy, systematics, stratigraphy, biogeography, and more. Yet, since its first publication in 1953, the scientific publishing and data dissemination landscape has changed radically. Treatise volumes are now available as open-access PDFs, but the raw data in the Treatise is difficult to obtain directly for research or other purposes. Our goal is to boost the discoverability of the published specimen images by depositing them on MorphoSource, a digital repository for 2D and 3D media. Although these images are already digitized as part of PDFs, there are significant benefits in making them more broadly accessible in different formats. While PDF documents are highly human readable, there are better tools to quickly locate and query images based on contextual information (e.g., taxonomic determination, repository and catalog number, age, locality, etc.). Further, previously published PDF images are organized in plates containing multiple specimens, and access, contextualization, and use of individual images are not always straightforward. Our workflow consists of curating, archiving, and sharing the catalog of 2D imagery of specimens and extracting and normalizing associated metadata. To accomplish this, we have developed Treatise-specific Python scripts to facilitate text processing and metadata extraction from the PDFs, which are instrumental in current and future efforts to make the Treatise text available and reusable. Making the Treatise images and their respective specimen metadata widely available and discoverable on an individual basis on MorphoSource, a public and searchable resource, allows us to further integrate these media with specimen records through aggregators like iDigBio, maximizing their potential for use. This will enable researchers to leverage image data and specimen metadata more efficiently, helping to strengthen and streamline research produced using these data.