GENERATING TEACHING DATASETS FOR MACHINE-LEARNING ALGORITHMS USING CITIZEN SCIENCE: HOW MANY USERS ARE NEEDED TO IDENTIFY BEDDING PLANES?

Spross, Erin

Paper No. 59-7

Presentation Time: 2:00 PM-6:00 PM

GENERATING TEACHING DATASETS FOR MACHINE-LEARNING ALGORITHMS USING CITIZEN SCIENCE: HOW MANY USERS ARE NEEDED TO IDENTIFY BEDDING PLANES?

SPROSS, Erin, Earth and Environmental Sciences, Temple University, 1801 N Broad St, Philadelphia, PA 19122, DAVATZES, Alexandra, Department of Earth and Environmental Science, Temple University, Philadelphia, PA 19122 and SHIPLEY, Thomas, Department of Psychology, Temple University, 1701 North 13th Street, 6th Floor Weiss Hall, Philadelphia, PA 19122

Citizen science has many useful applications for earth and social sciences. It can help scientists analyze large amounts of data quickly, and in doing so it also provides a source for psychologists to examine thinking in large groups of non-experts. We employed citizen scientists to identify bedding planes in outcrops in drone-captured images to build a dataset that could be used to train a machine-learning algorithm to identify bedding planes. Initial inspection of the data found that non-experts occasionally confused linear features such as erosional gullies for bedding planes. To avoid including these errors in the set used for training we collected multiple citizen scientists’ annotations of rock outcrop images using the citizen science website Zooniverse to identify the optimal number of citizen scientists who should annotate an image in order to maximize the number of accurately indicated bedding planes while minimizing the risk of users incorrectly tracing a non-bedding plane. We analyzed the results of our initial trial and found high agreement among users – numerous users correctly indicated bedding planes, with separate users often identifying the same bedding plane more than once. Three users per image was sufficient to have multiple bedding planes identified by more than one user, and not have any single erroneous feature identified by more than one user. When more than six users annotated each image there was an increase in the risk of two or more users making the same mistake, without increasing the number of correct bedding planes identified. Although one could exclude data based on low agreement, these findings offer initial guidance on the minimal number of citizen scientists needed to efficiently develop an accurate dataset for the machine learning algorithm to learn to identify bedding planes, which will aid earth scientists in collecting large numbers of annotations quickly.

Session No. 59--Booth# 33

T158. Harnessing Social Media, Crowd Source, and Other Modern Open Media to Advance Geoscience Research and Enhance Outreach and Education (Posters)

Sunday, 9 October 2022: 2:00 PM-6:00 PM

Exhibit Hall F (Colorado Convention Center)

Geological Society of America Abstracts with Programs. Vol 54, No. 5
doi: 10.1130/abs/2022AM-376976

© Copyright 2022 The Geological Society of America (GSA), all rights reserved. Permission is hereby granted to the author(s) of this abstract to reproduce and distribute it freely, for noncommercial purposes. Permission is hereby granted to any individual scientist to download a single copy of this electronic file and reproduce up to 20 paper copies for noncommercial purposes advancing science and education, including classroom use, providing all reproductions include the complete content shown here, including the author information. All other forms of reproduction and/or transmittal are prohibited without written permission from GSA Copyright Permissions.

Back to: T158. Harnessing Social Media, Crowd Source, and Other Modern Open Media to Advance Geoscience Research and Enhance Outreach and Education (Posters)

<< Previous Abstract | Next Abstract >>

GSA Connects 2022 meeting in Denver, Colorado

GENERATING TEACHING DATASETS FOR MACHINE-LEARNING ALGORITHMS USING CITIZEN SCIENCE: HOW MANY USERS ARE NEEDED TO IDENTIFY BEDDING PLANES?