Prismer: マルチモーダル エキスパートによるビジョン言語モデル
by Shikun Liu, Linxi Fan, Edward Johns, Zhiding Yu, Chaowei Xiao, & Anima Anandkumar We introduce Prismer, a data- and parameter-efficient vision-language model that leverages an ensemble of diverse, pre-trained domain experts. Prismer achieves fine-tuned and few-shot learning vision-language reasoning performance which is competitive with current state-of-the-arts, whilst requiring up to two orders of magnitude…