ASSESSING ADVANCED AI MODELS AS GEOSPATIAL DATA ANALYSTS: INTEGRATING NATURAL LANGUAGE WITH SQL IN SPATIAL DATABASES

Tso, Joseph

Paper No. 38-5

Presentation Time: 8:00 AM-5:30 PM

ASSESSING ADVANCED AI MODELS AS GEOSPATIAL DATA ANALYSTS: INTEGRATING NATURAL LANGUAGE WITH SQL IN SPATIAL DATABASES

TSO, Joseph, Civil and Infrastructure Engineering, George Mason University, 4400 University Drive, Fairfax, VA 22030 and PAN, Hailey, Computer Science and Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139

Over the past several years, advancements in large language models (LLMs) have enabled AI tools to perform an increasing variety of repetitive and simple digital tasks in fields from marketing to healthcare. Despite their potential to simplify geospatial data analysis and enhance existing systems, however, there is minimal literature on leveraging LLMs within geographic information systems (GIS) technologies. Currently, geoinformation analysis relies on human interaction through desktop software or programming languages, and, as a result, nuanced analysis on complex datasets typically requires years of experience. This can significantly hinder the progress of novel trend identification, development of new technologies, and overall advancement of the field. Thus, this study proposes a framework to test LLMs’ ability to automate database analysis by generating geospatial SQL queries, building on previous work. The goals of this study were fourfold: (1) use natural language to analyze geospatial datasets, (2) evaluate LLMs’ capabilities across four difficulty levels of questions, (3) perform a comparative analysis of current-gen models, and (4) offer insights on how LLM technology can be improved for geospatial analysis. The results show that there are clear distinctions between model capabilities, with accuracies for OpenAI, Gemini, LLaMa, and Claude being 96%, 52%, 76%, and 92%. Further analysis shows that Gemini and LLaMA succeeded in generating code for basic information retrieval, but struggled in multi-table joins and advanced analytical queries. We hope this framework will encourage the further development of LLMs for spatial data analysis and open up possibilities for novel tools for automated geospatial analytics.

Session No. 38--Booth# 145

T42. Open Science, Open Data: Geoinformatics and Why it Should be on Everyone’s Radar (Posters)

Sunday, 22 September 2024: 8:00 AM-5:30 PM

Hall D (Anaheim Convention Center)

Geological Society of America Abstracts with Programs. Vol. 56, No. 5
doi: 10.1130/abs/2024AM-402701

© Copyright 2024 The Geological Society of America (GSA), all rights reserved. Permission is hereby granted to the author(s) of this abstract to reproduce and distribute it freely, for noncommercial purposes. Permission is hereby granted to any individual scientist to download a single copy of this electronic file and reproduce up to 20 paper copies for noncommercial purposes advancing science and education, including classroom use, providing all reproductions include the complete content shown here, including the author information. All other forms of reproduction and/or transmittal are prohibited without written permission from GSA Copyright Permissions.

Back to: T42. Open Science, Open Data: Geoinformatics and Why it Should be on Everyone’s Radar (Posters)

<< Previous Abstract | Next Abstract >>

GSA Connects 2024 Meeting in Anaheim, California

ASSESSING ADVANCED AI MODELS AS GEOSPATIAL DATA ANALYSTS: INTEGRATING NATURAL LANGUAGE WITH SQL IN SPATIAL DATABASES