TY - GEN
T1 - LightToken
T2 - 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2023
AU - Wang, Haoyu
AU - Li, Ruirui
AU - Jiang, Haoming
AU - Wang, Zhengyang
AU - Tang, Xianfeng
AU - Bi, Bin
AU - Cheng, Monica
AU - Yin, Bing
AU - Wang, Yaqing
AU - Zhao, Tuo
AU - Gao, Jing
N1 - Publisher Copyright:
© 2023 Owner/Author.
PY - 2023/8/4
Y1 - 2023/8/4
N2 - Pre-trained language models∼(PLMs) such as BERT, RoBERTa, and DeBERTa have achieved state-of-the-art performance on various downstream tasks. The enormous sizes of PLMs hinder their deployment in resource-constrained scenarios, e.g., on edge and mobile devices. To address this issue, many model compression approaches have been proposed to reduce the number of model parameters. This paper focuses on compressing the token embedding matrices of PLMs, which typically make up a large proportion∼(around 20-30%) of the entire model parameters. Existing efforts to compress token embedding usually require the introduction of customized compression architectures or the optimization of model compression processes for individual downstream tasks, limiting their applicability in both model and task dimensions. To overcome these limitations and adhere to the principle of "one-for-all", we propose a lightweight token embedding framework named LightToken, which is able to produce compressed token embedding in a task and model-agnostic fashion. LightToken is generally compatible with different architectures and applicable to any downstream task. Specifically, through an integration of low-rank approximation, novel residual binary autoencoder, and a new compression loss function, LightToken can significantly improve the model compression ratio. To demonstrate the effectiveness of LightToken, we conduct comprehensive experiments on natural language understanding and question answering tasks. In particular, LightToken improves the state-of-the-art token embedding compression ratio from 5 to 25 and outperforms the existing token embedding compression approaches by 11% and 5% on GLUE and SQuAD v1.1 benchmarks, respectively.
AB - Pre-trained language models∼(PLMs) such as BERT, RoBERTa, and DeBERTa have achieved state-of-the-art performance on various downstream tasks. The enormous sizes of PLMs hinder their deployment in resource-constrained scenarios, e.g., on edge and mobile devices. To address this issue, many model compression approaches have been proposed to reduce the number of model parameters. This paper focuses on compressing the token embedding matrices of PLMs, which typically make up a large proportion∼(around 20-30%) of the entire model parameters. Existing efforts to compress token embedding usually require the introduction of customized compression architectures or the optimization of model compression processes for individual downstream tasks, limiting their applicability in both model and task dimensions. To overcome these limitations and adhere to the principle of "one-for-all", we propose a lightweight token embedding framework named LightToken, which is able to produce compressed token embedding in a task and model-agnostic fashion. LightToken is generally compatible with different architectures and applicable to any downstream task. Specifically, through an integration of low-rank approximation, novel residual binary autoencoder, and a new compression loss function, LightToken can significantly improve the model compression ratio. To demonstrate the effectiveness of LightToken, we conduct comprehensive experiments on natural language understanding and question answering tasks. In particular, LightToken improves the state-of-the-art token embedding compression ratio from 5 to 25 and outperforms the existing token embedding compression approaches by 11% and 5% on GLUE and SQuAD v1.1 benchmarks, respectively.
KW - compression
KW - pre-trained language model
UR - https://www.scopus.com/pages/publications/85171359545
U2 - 10.1145/3580305.3599416
DO - 10.1145/3580305.3599416
M3 - Conference contribution
AN - SCOPUS:85171359545
T3 - Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
SP - 2302
EP - 2313
BT - KDD 2023 - Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining
PB - Association for Computing Machinery
Y2 - 6 August 2023 through 10 August 2023
ER -