TY - GEN
T1 - Towards a foundation model for geospatial artificial intelligence (vision paper)
AU - Mai, Gengchen
AU - Cundy, Chris
AU - Choi, Kristy
AU - Hu, Yingjie
AU - Lao, Ni
AU - Ermon, Stefano
N1 - Publisher Copyright:
© 2022 Owner/Author.
PY - 2022/11/1
Y1 - 2022/11/1
N2 - Large pre-trained models, also known as foundation models (FMs), are trained in a task-agnostic manner on large-scale data and can be adapted to a wide range of downstream tasks by fine tuning, few-shot, or even zero-shot learning. Despite their successes in language and vision tasks, we have yet to see an attempt to develop foundation models for geospatial artificial intelligence (GeoAI). In this work, we explore the promises and challenges for developing multimodal foundation models for GeoAI. We first show the advantages of this idea by testing the performance of existing Large pre-trained Language Models (LLMs) (e.g. GPT-2 and GPT-3) on two geospatial semantics tasks. Results indicate that these task-agnostic LLMs can outperform task-specific fully-supervised models on both tasks with 2-9% improvement in a few-shot learning setting. However, we also show the limitations of these existing foundation models given the multimodality nature of GeoAI, especially when dealing with geometries in conjunction with other modalities. So we discuss the possibility of a multimodal foundation model which can reason over various types of geospatial data through geospatial alignments. We conclude this paper by discussing the unique risks and challenges to develop such model for GeoAI.
AB - Large pre-trained models, also known as foundation models (FMs), are trained in a task-agnostic manner on large-scale data and can be adapted to a wide range of downstream tasks by fine tuning, few-shot, or even zero-shot learning. Despite their successes in language and vision tasks, we have yet to see an attempt to develop foundation models for geospatial artificial intelligence (GeoAI). In this work, we explore the promises and challenges for developing multimodal foundation models for GeoAI. We first show the advantages of this idea by testing the performance of existing Large pre-trained Language Models (LLMs) (e.g. GPT-2 and GPT-3) on two geospatial semantics tasks. Results indicate that these task-agnostic LLMs can outperform task-specific fully-supervised models on both tasks with 2-9% improvement in a few-shot learning setting. However, we also show the limitations of these existing foundation models given the multimodality nature of GeoAI, especially when dealing with geometries in conjunction with other modalities. So we discuss the possibility of a multimodal foundation model which can reason over various types of geospatial data through geospatial alignments. We conclude this paper by discussing the unique risks and challenges to develop such model for GeoAI.
KW - foundation models
KW - geospatial artificial intelligence
KW - large language models
UR - https://www.scopus.com/pages/publications/85140199263
U2 - 10.1145/3557915.3561043
DO - 10.1145/3557915.3561043
M3 - Conference contribution
AN - SCOPUS:85140199263
T3 - GIS: Proceedings of the ACM International Symposium on Advances in Geographic Information Systems
BT - 30th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, ACM SIGSPATIAL GIS 2022
A2 - Renz, Matthias
A2 - Sarwat, Mohamed
A2 - Nascimento, Mario A.
A2 - Shekhar, Shashi
A2 - Xie, Xing
PB - Association for Computing Machinery
T2 - 30th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, SIGSPATIAL GIS 2022
Y2 - 1 November 2022 through 4 November 2022
ER -