Textual speak material-to-Image Diffusion Object を使用した Originate-Vocabulary Panoptic セグメンテーション
1University of California San Diego, 2NVIDIA (* the work was done at an internship at NVIDIA, † equal contribution) Segment and categorize any object, even ones not seen during training Abstract We present ODISE: Open-vocabulary DIffusion-based panoptic SEgmentation, which unifies pre-trained text-image diffusion and discriminative models to perform open-vocabulary panoptic segmentation. Text-to-image diffusion models have…