LINKING DATA SILOS VIA FUZZY MATCHING ALGORITHMS

LAUTERS, Jonathan David, Florida State University, 731 Rundell St, Iowa City, IA 52240 and NELSON, Gil, iDigBio, Florida State University, Tallahassee, FL 32306, jonathan.lauters@gmail.com

ePANDDA (enhanced PAleontological and Neontological Data Discovery API ) is an EarthCube Integrative Activities project designed to increase accessibility, linking, and discovery of paleontological and neontological data across existing siloed data stores. Initial collaborators include PaleoBiology Database (PBDB), iDigBio, and iDigPaleo. The ultimate goal of the project is to create an independent, transactional API that communicates with the APIs of participating databases to distribute and process queries between them for the purpose of returning formatted datasets. Using parameter-driven configurable RESTful API calls will allow search tools to be built that will eliminate the need to make separate searches of multiple data portals and the subsequent time consuming step of translating these data into a coherent dataset by hand. ePANDDA will enable users to leverage data-matching logic to create tailored apps for visualization, outreach, and collaboration. The data matching logic in ePANDDA matches different types of data (e.g. specimens mentioned in publications to individual specimen records) through non trivial means. Due to the complexity of these matching efforts and the amount of data, real time access to the data providers APIs was found to be inoperable. Distributed computing practices are employed instead to perform bulk matching of available datasets and cache resulting data. Identifiers (e.g. UUIDs, DOIs, ORCID) are becoming increasingly valuable and will be used to help foster relations across data types. ePANDDA will use the OpenAnnotation model to push citation data back to iDigBio. Returned annotations will be incorporated into PBDB, iDigBio, and iDigPaleo, allowing each database to enhance data completeness for its users while eliminating the need for replicating, duplicating, or mirroring existing data across multiple data stores. Future goals include building and demonstrating an innovative model for linking data, and providing avenues for bringing other collaborating databases online.

Session No. 105

T138. Fossil Specimens 0's and 1's: Databases, Standards, & Mobilization

Monday, 26 September 2016: 8:00 AM-12:00 PM

Mile High Ballroom 4C (Colorado Convention Center)

Geological Society of America Abstracts with Programs. Vol. 48, No. 7
doi: 10.1130/abs/2016AM-285169

© Copyright 2016 The Geological Society of America (GSA), all rights reserved. Permission is hereby granted to the author(s) of this abstract to reproduce and distribute it freely, for noncommercial purposes. Permission is hereby granted to any individual scientist to download a single copy of this electronic file and reproduce up to 20 paper copies for noncommercial purposes advancing science and education, including classroom use, providing all reproductions include the complete content shown here, including the author information. All other forms of reproduction and/or transmittal are prohibited without written permission from GSA Copyright Permissions.

Back to: T138. Fossil Specimens 0's and 1's: Databases, Standards, & Mobilization

<< Previous Abstract | Next Abstract >>

GSA Annual Meeting in Denver, Colorado, USA - 2016

LINKING DATA SILOS VIA FUZZY MATCHING ALGORITHMS