Joint 72nd Annual Southeastern/ 58th Annual Northeastern Section Meeting - 2023

Paper No. 11-2
Presentation Time: 1:50 PM

AUTOMATED SOLUTIONS FOR GEOREFERENCING AND VECTORIZING GEOLOGICAL MAPS


LEDERER, Graham1, ROSERA, Joshua M.1 and GOLDMAN, Margaret2, (1)U.S. Geological Survey, Reston, VA 20192, (2)U.S. Geological Survey, Denver, CO 80225

Geologic maps represent a rich source of lithology, structure, and other data used broadly throughout the geosciences. In particular, mineral resource assessments require high-resolution details that are typically only available on local-scale maps. Extracting information from dozens, or even hundreds, of separate maps represents a significant impediment to integrating geologic data. Furthermore, despite substantial progress in digital geologic mapping techniques, a wealth of valuable information can be found in legacy map documents stored as nonspatial high-resolution images of scanned paper maps. These legacy map documents contain valuable information that can be utilized for synthesis of geologic data and resource assessments.

The U.S. Geological Survey (USGS) in partnership with DARPA, NASA-JPL, and MITRE launched a pilot project to engage the artificial intelligence and machine learning communities in the recovery of legacy geologic map data through an open competition. Two challenges related to map processing were designed: 1) automated georeferencing of map images and 2) legend-based feature extraction. Using an extensive training and validation dataset sourced from the National Geologic Map Database (NGMDB), competitors combined computer vision, image segmentation, text recognition, and other machine learning techniques to provide code-based, open-source, automated solutions for the two challenges. These solutions were evaluated using standard metrics.

The novel solutions developed during the competition phase of the pilot project represent promising methods for rapidly extracting detailed geologic information embedded within map images. With over 100,000 maps in the NGMDB and thousands of additional maps in USGS publications, industry technical reports, and scientific literature, the application of these solutions would greatly accelerate the ability of the USGS to assimilate and derive additional value from source documents. Publication of the competition results, including the automated code as well as training and validation datasets, in publicly accessible repositories will greatly benefit the broader geoscience community.