TY - GEN
T1 - NVCiM-PT
T2 - 2025 Design, Automation and Test in Europe Conference, DATE 2025
AU - Qin, Ruiyang
AU - Ren, Pengyu
AU - Yan, Zheyu
AU - Liu, Liu
AU - Liu, Dancheng
AU - Nassereldine, Amir
AU - Xiong, Jinjun
AU - Ni, Kai
AU - Hu, Sharon
AU - Shi, Yiyu
N1 - Publisher Copyright:
© 2025 EDAA.
PY - 2025
Y1 - 2025
N2 - Large Language Models (LLMs) deployed on edge devices, known as edge LLMs, need to continuously fine-tune their model parameters from user-generated data under limited resource constraints. However, most existing learning methods are not applicable for edge LLMs because of their reliance on high resources and low learning capacity. Prompt tuning (PT) has recently emerged as an effective fine-tuning method for edge LLMs by only modifying a small portion of LLM parameters, but it suffers from user domain shifts, resulting in repetitive training and losing resource efficiency. Conventional techniques to address domain shift issues often involve complex neural networks and sophisticated training, which are incompatible for PT for edge LLMs. Therefore, an open research question is how to address domain shift issues for edge LLMs with limited resources. In this paper, we propose a prompt tuning framework for edge LLMs, exploiting the benefits offered by non-volatile computing-in-memory (NVCiM) architectures. We introduce a novel NVCiM-assisted PT framework, where we narrow down the core operations to matrix-matrix multiplication, which can then be accelerated by performing in-situ computation on NVCiM. To the best of our knowledge, this is the first work employing NVCiM to improve the edge LLM PT performance.
AB - Large Language Models (LLMs) deployed on edge devices, known as edge LLMs, need to continuously fine-tune their model parameters from user-generated data under limited resource constraints. However, most existing learning methods are not applicable for edge LLMs because of their reliance on high resources and low learning capacity. Prompt tuning (PT) has recently emerged as an effective fine-tuning method for edge LLMs by only modifying a small portion of LLM parameters, but it suffers from user domain shifts, resulting in repetitive training and losing resource efficiency. Conventional techniques to address domain shift issues often involve complex neural networks and sophisticated training, which are incompatible for PT for edge LLMs. Therefore, an open research question is how to address domain shift issues for edge LLMs with limited resources. In this paper, we propose a prompt tuning framework for edge LLMs, exploiting the benefits offered by non-volatile computing-in-memory (NVCiM) architectures. We introduce a novel NVCiM-assisted PT framework, where we narrow down the core operations to matrix-matrix multiplication, which can then be accelerated by performing in-situ computation on NVCiM. To the best of our knowledge, this is the first work employing NVCiM to improve the edge LLM PT performance.
UR - https://www.scopus.com/pages/publications/105006919425
U2 - 10.23919/DATE64628.2025.10993249
DO - 10.23919/DATE64628.2025.10993249
M3 - Conference contribution
AN - SCOPUS:105006919425
T3 - Proceedings -Design, Automation and Test in Europe, DATE
BT - 2025 Design, Automation and Test in Europe Conference, DATE 2025 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 31 March 2025 through 2 April 2025
ER -