NEPTUNE - DEVELOPING A DIGITAL INFORMATION INFRASTRUCTURE FOR MICROPALEONTOLOGY IN THE 21st CENTURY
Neptune is a relational database and a set of external tools that link together raw occurrence data for marine microfossils, as given in several hundred selected original range charts of deep-sea drilling science reports, to the essential scientific information needed to effectively retrieve and synthesize these data. These include numeric geologic ages for every occurrence, based on quantitative age models for every sample/hole in the system, and master taxonomic name lists that link synonyms for the same taxa concepts to each other, and distinguish different taxonomic identification quality occurrence records (e.g., clearly identified vs .cf or ?' observations) from each other. Neptune thus allows data to be retrieved from this important archive in a form suitable for large-scale synthesis of the deep-sea marine microfossil record, and provides tools for summarizing the information. More recently, Neptune has been linked to the successor of the Sepkoski database - the Paleobiology Database (PBDB), allowing microfossil data from land sections to be combined with data from marine sections. The system is currently being used to study large-scale patterns of Cenozoic evolutionary change in the plankton, and as an age model and taxonomic reference library for other users of deep-sea drilling sections.
The current implementation of Neptune is as a PostgreSQL relational database hosted on the Chronos server stack at ISU. It is searchable through the Chronos portal and seamlessly integrated with the Java-based Age Depth Plot and the Age Range Chart applications.
Analysis of large, heterogeneous datasets inevitably raises problems of mixed data quality, with data gaps and generally uneven sampling, outliers and incorrectly entered primary observations all affecting the validity of analyses. Via the link with PBDB, Neptune analyses can make use of PBDB's large library of paleobiologic tools for dealing with unevenly sampled data (range through and subsampling, etc). Current work is developing tools for dealing with age outliers in taxon ranges created due to taxonomic errors in the original data, reworking of fossils, as well as age model errors due to poorly resolved or mutually inconsistent primary chronostratigraphic information.
Future development of this system is envisioned as part of a gradually evolving network of digital resources in marine micropaleontology. These include stronger links to the primary deep-sea sediment core databases such as ODP's Janus system, the addition of biostratigraphic and lithologic data from all ODP sites, links to digital taxonomic catalogs of species images and descriptions (one such link has already been developed by Chronos), and to major collections of marine microfossil materials held in Museums and other institutions, such as the Micropaleontological Reference Center network of deep-sea marine microfossil slides. Effective networking of these resources will require developing funding mechanisms to maintain and regularly update a central registry of the shared key linking field data - the taxonomic and age model information. The benefits for research however will be substantial, offering major increases in data synthesis capacity, particularly for studies of global, long time scale processes, and improved efficiency in data retrieval and analysis in many other individual micropaleontologic research projects.