Skip to main navigation Skip to search Skip to main content

MAML-en-LLM: Model Agnostic Meta-Training of LLMs for Improved In-Context Learning

  • Sanchit Sinha
  • , Yuguang Yue
  • , Victor Soto
  • , Mayank Kulkarni
  • , Jianhua Lu
  • , Aidong Zhang
  • University of Virginia
  • Amazon.com, Inc.

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

9 Scopus citations

Abstract

Adapting large language models (LLMs) to unseen tasks with incontext training samples without fine-tuning remains an important research problem. To learn a robust LLM that adapts well to unseen tasks, multiple meta-training approaches have been proposed such as MetaICL and MetaICT, which involve meta-training pre-trained LLMs on a wide variety of diverse tasks. These meta-training approaches essentially perform in-context multi-task fine-tuning and evaluate on a disjointed test set of tasks. Even though they achieve impressive performance, their goal is never to compute a truly general set of parameters. In this paper, we propose MAML-en-LLM, a novel method for meta-training LLMs, which can learn truly generalizable parameters that not only performs well on disjointed tasks but also adapts to unseen tasks. We see an average increase of 2% on unseen domains in the performance while a massive 4% improvement on adaptation performance. Furthermore, we demonstrate that MAML-en-LLM outperforms baselines in settings with limited amount of training data on both seen and unseen domains by an average of 2%. Finally, we discuss the effects of type of tasks, optimizers and task complexity, an avenue barely explored in metatraining literature. Exhaustive experiments across 7 task settings along with two data settings demonstrate that models trained with MAML-en-LLM outperform SOTA meta-training approaches.

Original languageEnglish
Title of host publicationKDD 2024 - Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining
PublisherAssociation for Computing Machinery
Pages2711-2720
Number of pages10
ISBN (Electronic)9798400704901
DOIs
StatePublished - Aug 24 2024
Event30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2024 - Barcelona, Spain
Duration: Aug 25 2024Aug 29 2024

Publication series

NameProceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
ISSN (Print)2154-817X

Conference

Conference30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2024
Country/TerritorySpain
CityBarcelona
Period08/25/2408/29/24

Keywords

  • LLMs
  • generalization
  • in-context learning
  • meta learning
  • optimization

Fingerprint

Dive into the research topics of 'MAML-en-LLM: Model Agnostic Meta-Training of LLMs for Improved In-Context Learning'. Together they form a unique fingerprint.

Cite this