Selection-p: Self-Supervised Task-Agnostic Prompt Compression for Faithfulness and Transferability

1The Hong Kong University of Science and Technology
2Tencent AI Lab


Compression result by Selection-p the in-context learning demonstration on the Subj task under a 10x compression rate.

Abstract

Large Language Models (LLMs) have demonstrated impressive capabilities in a wide range of natural language processing tasks when leveraging in-context learning. To mitigate the additional computational and financial costs associated with in-context learning, several prompt compression methods have been proposed to compress the in-context learning prompts. Despite their success, these methods face challenges with transferability due to model-specific compression, or rely on external training data, such as GPT-4. In this paper, we investigate the ability of LLMs to develop a unified compression method that discretizes uninformative tokens, utilizing a self-supervised pre-training technique. By introducing a small number of parameters during the continual pre-training, the proposed Selection-p produces a probability for each input token, indicating whether to preserve or discard it. Experiments show Selection-p achieves state-of-the-art performance across numerous classification tasks, achieving compression rates of up to 10 times while experiencing only a marginal 0.8% decrease in performance. Moreover, it exhibits superior transferability to different models compared to prior work. Additionally, we further analyze how Selection-p helps maintain performance on in-context learning with long contexts.

Selection-p


Illustration with the training process. Areas in orange are learnable parameters. For the input context [x1, x2, . . . , xn-1], inference without parameters update is performed first to create the attention mask p. These subsequently form the model input for LoRA training and updating the parameters of the additional linear layer.

BibTeX

@article{chung2024selection,
  title={Selection-p: Self-Supervised Task-Agnostic Prompt Compression for Faithfulness and Transferability},
  author={Chung, Tsz Ting and Cui, Leyang and Liu, Lemao and Huang, Xinting and Shi, Shuming and Yeung, Dit-Yan},
  journal={arXiv preprint arXiv:2410.11786},
  year={2024}
}