GSA Connects 2024 Meeting in Anaheim, California

Paper No. 38-5
Presentation Time: 8:00 AM-5:30 PM

ASSESSING ADVANCED AI MODELS AS GEOSPATIAL DATA ANALYSTS: INTEGRATING NATURAL LANGUAGE WITH SQL IN SPATIAL DATABASES


TSO, Joseph, Civil and Infrastructure Engineering, George Mason University, 4400 University Drive, Fairfax, VA 22030 and PAN, Hailey, Computer Science and Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139

Over the past several years, advancements in large language models (LLMs) have enabled AI tools to perform an increasing variety of repetitive and simple digital tasks in fields from marketing to healthcare. Despite their potential to simplify geospatial data analysis and enhance existing systems, however, there is minimal literature on leveraging LLMs within geographic information systems (GIS) technologies. Currently, geoinformation analysis relies on human interaction through desktop software or programming languages, and, as a result, nuanced analysis on complex datasets typically requires years of experience. This can significantly hinder the progress of novel trend identification, development of new technologies, and overall advancement of the field. Thus, this study proposes a framework to test LLMs’ ability to automate database analysis by generating geospatial SQL queries, building on previous work. The goals of this study were fourfold: (1) use natural language to analyze geospatial datasets, (2) evaluate LLMs’ capabilities across four difficulty levels of questions, (3) perform a comparative analysis of current-gen models, and (4) offer insights on how LLM technology can be improved for geospatial analysis. The results show that there are clear distinctions between model capabilities, with accuracies for OpenAI, Gemini, LLaMa, and Claude being 96%, 52%, 76%, and 92%. Further analysis shows that Gemini and LLaMA succeeded in generating code for basic information retrieval, but struggled in multi-table joins and advanced analytical queries. We hope this framework will encourage the further development of LLMs for spatial data analysis and open up possibilities for novel tools for automated geospatial analytics.