ARLO: AN ADVANCED HUMAN AND MACHINE LEARNING SYSTEM FOR AUTOMATED POLLEN CLASSIFICATION

Tcheng, David K.

Paper No. 9

Presentation Time: 4:15 PM

ARLO: AN ADVANCED HUMAN AND MACHINE LEARNING SYSTEM FOR AUTOMATED POLLEN CLASSIFICATION

TCHENG, David K., Illinois Informatics Institute, National Center for Supercomputing Applications, University of Illinois, 1205 W. Clark St., Room 1008, Urbana, IL 61801, HASELHORST, Derek, School of Integrative Biology, University of Illinois at Urbana-Champaign, 505 S. Goodwin Avenue, Urbana, IL 61801 and PUNYASENA, Surangi W., Department of Plant Biology, University of Illinois, 505 S. Goodwin Ave., Urbana, IL 61801, dtcheng@illinois.edu

We present an advanced human and machine learning system (ARLO) for solving tropical pollen classification problems. We present a case study where our human expert (Haselhorst) can identify a large number of pollen classes (n = 119). For the machine learning process, we created 13,650 training examples where each example is a 3-d image pixel matrix representing a z-stack containing the pollen grain. The number of examples in each class is naturally skewed, with the most observed class containing 2,708 examples and the least observed class containing only single example (12 cases).

Our prediction system, ARLO, has the following components: (1) a high throughput automated slide scanner, (2) a virtual microscope and pollen tagger, (3) a human expert, (4) a space of image feature extraction algorithms, (5) a space of supervised learning algorithms, (6) a bias optimizer for searching these spaces for optimal system configurations, and (7) a two-tiered cross validation framework to quantify and control overfitting.

We demonstrate that as more bias optimization is performed, the divergence between the “estimated” and “true” accuracy widens causing “overfitting” making comparisons between competing systems hazardous. We show how using two-tiered cross validation gives us “corrected” accuracy estimates and a more robust approach to comparing competing algorithms.

Session No. 201

T146. The Future of Quantitative Paleontology: Biometry, Computer Vision, and Machine Learning

Tuesday, 6 November 2012: 1:30 PM-5:30 PM

217A (Charlotte Convention Center)

Geological Society of America Abstracts with Programs. Vol. 44, No. 7, p.480

© Copyright 2012 The Geological Society of America (GSA), all rights reserved. Permission is hereby granted to the author(s) of this abstract to reproduce and distribute it freely, for noncommercial purposes. Permission is hereby granted to any individual scientist to download a single copy of this electronic file and reproduce up to 20 paper copies for noncommercial purposes advancing science and education, including classroom use, providing all reproductions include the complete content shown here, including the author information. All other forms of reproduction and/or transmittal are prohibited without written permission from GSA Copyright Permissions.

Back to: T146. The Future of Quantitative Paleontology: Biometry, Computer Vision, and Machine Learning

<< Previous Abstract | Next Abstract >>

The Geological Society of America 2012 GSA Annual Meeting in Charlotte Charlotte, North Carolina, USA

ARLO: AN ADVANCED HUMAN AND MACHINE LEARNING SYSTEM FOR AUTOMATED POLLEN CLASSIFICATION

The Geological Society of America
2012 GSA Annual Meeting in Charlotte
Charlotte, North Carolina, USA