Computer Vision Toolbox Model for OpenAI CLIP Network

The Contrastive Learning Image Pre-Training (CLIP) network is a vision language model that can be used for joint image-text classification.

您现在正在关注此提交

The CLIP network uses contrastive learning to encode image and textual data into a shared feature space for joint classification. Images and text with high similarity will be close in this feature space, and have a high CLIP score. This further enables image search from input text, and text search from an input image.

标签

添加标签

Add the first tag.

MATLAB 版本兼容性

  • 兼容 R2026a

平台兼容性

  • Windows
  • macOS (Apple 芯片)
  • macOS (Intel)
  • Linux