2003 Seattle Annual Meeting (November 2–5, 2003)

Paper No. 1
Presentation Time: 8:00 AM

PALEODAISY: AN ADAPTIVE SYSTEM FOR THE AUTOMATED RECOGNITION OF FOSSIL TAXA


MACLEOD, Norman, The Natural History Museum, Cromwell Road, London, SW7 5BD, United Kingdom, O'NEILL, M., Museum of Natural History, Oxford Univ, Oxford, OX1 3PW and WALSH, S.A., Department of Palaeontology, The Nat History Museum, Cromwell Road, London, SW7 5BD, N.MacLeod@nhm.ac.uk

A number of attempts have been made to design computer vision systems capable of identifying fossil morphologies reliably. While progress has been made, none of these has achieved a level of accuracy comparable to that of multivariate analysis based on carefully selected morphometric measurements. The latter approach is not viable as a generalized automated identification strategy because of the diversity of fossil morphologies and the limited number of morphometrically viable landmark points available for morphological characterization.

The DAISY system takes a different approach to this problem. By using data-compression algorithms to boost the signal-to-noise of training set scenes, and treating the compressed files as sets of object-characterization variables, DAISY partitions the sampled shape space into group-specific domains using k-means clustering algorithms. Unknown specimens can then be projected into the group-partitioned morphospace and identifications made on a probabilistic basis. DAISY routinely achieves over ninety percent accuracy for datasets consisting of crudely oriented specimens. Unoriented specimens are handled by adding specimens photographed in multiple orientations to each training set. Because of the greater information content available from scenes (as opposed to traditional landmarks or linear distances), and because of its ability to record information from multiple test orientations, DAISY uses all of the visual information available to experienced systematic paleontologists.

DAISY already makes identifications much more consistently than humans in limited trials and is limited itself only by the availability of adequate training sets. Moreover, because of its distributed architecture, DAISY becomes better with each correct identification and uses slack processor time to refine group-domain definitions continuously. With continued development DAISY has the potential to (1)serve as a platform for the development of ever more refined object recognition algorithms, (2) free systematists from the drudgery of routine specimen identifications, (3) find new characters and character states useful in systematic analysis, and (4) improve substantially the accuracy and reproducibility of systematic paleontological data.