Architecture
Vision-Language Model
Definition
“
AI models that can understand and reason about both images and text simultaneously. Vision-language models are used for image captioning, visual question answering, document analysis, and automated UI testing. Examples include GPT-4V, Claude 3.5 Sonnet, and Google's PaLI.
”
Related Terms
No related terms linked yet.
Explore all terms →