PROMPT PROTOTYPE LEARNING BASED ON RANKING INSTRUCTION FOR FEW-SHOT VISUAL TASKS
Li Sun, Liuan Wang, Jun Sun, Takayuki Okatani
-
SPS
IEEE Members: $11.00
Non-members: $15.00
Querying large language models (LLMs), such as GPT-3, for high-quality prompts and utilizing pre-trained vision-language models, such as CLIP, to construct a zero-shot visual classification model, offer promising performance across various downstream visual tasks. However, when applied to specific domains, their efficacy is restricted due to the gap between the general prompts they generate and the required domain-specific knowledge. In this paper, we propose a novel, lightweight method for prompt prototype learning through ranking instruction, specifically designed to bridge this gap in the context of few-shot visual classification. We generate domain-specific prompts leveraging the knowledge contained in LLMs and then fine-tune the prompt prototype with effective ranking instructions from several domain images. Our few-shot experiments on facial expression benchmarks demonstrate the efficacy of the prompt prototype. Notably, our method delivers results that are on par with state-of-the-art few-shot image classification techniques and can be integrated with them to further improve performance in the facial expression domain. Our approach provides a promising solution to few-shot visual classification, making use of the knowledge contained in LLMs to generate domain-specific prompts.