2007 GSA Denver Annual Meeting (28–31 October 2007)

Paper No. 16
Presentation Time: 5:15 PM

REASONS FOR, AND APPROACHES TO, AUTOMATING TAXONOMIC IDENTIFICATIONS IN PALEONTOLOGY


MACLEOD, Norman, The Natural History Museum, Cromwell Road, London, SW7 5BD, United Kingdom and KRIEGER, J., Department of Palaeontology, The Nat History Museum, Cromwell Road, London, SW7 5BD, N.MacLeod@nhm.ac.uk

At present one of the primary factors limiting application of palaeontological data in many—if not all—systematic, ecological, and stratigraphic contexts is lack of access to large numbers of accurate specimen identifications. Several studies have now shown that identification consistencies, even among expert taxonomists, are surprisingly low (c. 60-70%). Owing to the fact that fewer numbers of paleontologists are being trained in the detailed taxonomy of key groups, one way of addressing this situation is to automate the taxon-identification process. Four analytic approaches are currently considered relevant to automated taxon identification: form factors, linear distances between landmark points, geometric morphometrics (based on landmarks and/or outlines in 2D or 3D), and artificial neural nets. Digital images of Neogene bivalve species and Neogene planktonic foraminiferal species were used to compare and contrast these approaches in terms of accuracy, generality, speed, and scalability. Form factors were the easiest to use as a basis for automated recognition, but performed least well in cross-tabulation accuracy tests (c. 70%). 2D distance and landmark approaches yielded better accuracy results (c. 90%), but would prove difficult to automate fully in terms of data collection. 2D outline approaches combined good generalizability and accuracy, but were inferior to results obtained by a new method—eigensurface analysis—that supports direct comparison of 3D surfaces. All of these morphometric approaches exhibit superior geometric interpretability, but limited scalability. A plastic self-organizing map-based neural net provided the best overall combination of accuracy, generalizability, and scalability, though this approach required larger training-set sizes to achieve optimal performance. Based on these results it seems both desirable and possible to assemble automated taxon recognition systems using current technology. Such systems can achieve far more rapid, accurate, and consistent identifications than human taxonomists. Moreover, this technology can also contribute directly to taxonomy by encouraging collaboration and allowing taxonomists to approach their studies in a more consistent and formally hypothetico-deductive manner.