calendar Add meeting dates to your calendar.

 

Paper No. 8
Presentation Time: 3:15 PM

USING NETWORKS OF NATURAL SCIENCE COLLECTIONS DATA FOR QUALITY CONTROL AND EFFICIENT DATA CAPTURE


MORRIS, Paul J.1, ILOABACHIE, Chinua2, KELLY, Maureen3, LOWERY, David2, MACKLIN, James A.3, MORRIS, Robert A.4 and WANG, Zhimin3, (1)Harvard University Herbaria and the Museum of Comparative Zoology, Harvard University, 22 Divinity Ave, Cambridge, MA 01451, (2)Department of Computer Science, University of Massachusetts Boston, 100 Morrissey Boulevard, Boston, MA 02125, (3)Harvard University Herbaria, Harvard University, 22 Divinity Ave, Cambridge, MA 01451, (4)Department of Computer Science, University of Massachusetts, Boston, and Harvard University Herbaria, 22 Divinity Ave, Cambridge, MA 01451, mole@morris.net

The biodiversity collections community has a long history of developing networks to aggregate and distribute specimen data. Starting from distributed queries, these networks have shifted to data harvesting and aggregation. With successful aggregation of >108 specimen and observational records has come the realization that the greatest problems in the federation of scientific data come not from the technologies for federation, but rather from the quality and validity of the data in the context of research problems.

Expressed thus, as fitness of data for use, identifying and correcting errors, maintaining currency, and providing sufficient temporal and spatial coverage to address the scientific questions being asked are all critical quality issues in networks of distributed data. Providing coverage for research needs is particularly daunting, as collections growth typically far out paces digitization. Highly efficient work flows for data capture are integral to fitness for use. Driven by specific research goals, parts of the community have specifically addressed data quality by bulk georeferencing textural locality data under a set of uniform and rigorous standards.

To explore how aggregation can improve both quality of existing data and efficiency of data capture, we have designed and implemented a prototype network for aggregating and annotating collections data. We call it “Filtered Push” as a core function is to push annotations from arbitrary network points to relevant authoritative repositories, to be filtered and possibly accepted as changes to authoritative records. Filtered Push is, more broadly, a domain-neutral framework for originating, distributing, and analyzing annotations. Network participants can subscribe to notifications arising from ontology-based analyses of new annotations or of purpose-built queries against the network's global history of annotations. We perceive annotation as critical for quality improvement of distributed data.

Meeting Home page GSA Home Page