captionImage

Caption images using Moondream vision-language model (VLM)

Since R2026a

Syntax

captions = captionImage(mdModel,I)

captionsDS = captionImage(mdModel,imds)

[___] = captionImage(___,CaptionVerbosity=verbosity)

Description

Add-On Required: This feature requires the Computer Vision Toolbox Model for Moondream Vision Language Model add-on.

captions = captionImage(mdModel,I) generates a caption for image input I using the Moondream™ vision-language model mdModel.

captionsDS = captionImage(mdModel,imds) generates captions for images in the input image datastore imds using the Moondream vision-language model mdModel.

[___] = captionImage(___,CaptionVerbosity=verbosity) specifies the length of the generated captions, in addition to the arguments from previous syntaxes.

example

Examples

collapse all

Caption Image Using Moondream Vision-Language Model

This example uses:

Open Live Script

Load the Moondream vision-language model.

mdModel = moondream;

Load an image to caption into the workspace, and display the image.

I = imread("peppers.png");
imshow(I)

Figure contains an axes object. The hidden axes object contains an object of type image.

Caption the image using the captionImage object function.

captions = captionImage(mdModel,I);

Display the generated image caption.

display(captions)

captions = 
" A purple tablecloth holds a vibrant array of red, green, yellow, and white peppers, onions, and garlic, arranged in a visually appealing composition."

Generate Descriptive Image Caption Using Moondream Vision-Language Model

This example uses:

Open Live Script

Load the Moondream vision-language model.

mdModel = moondream;

Load an image to caption into the workspace, and display the image.

I = imread("visionteam.jpg");
imshow(I)

Generate a detailed caption for the image by specifying the CaptionVerbosity argument of the captionImage object function.

captions = captionImage(mdModel,I,CaptionVerbosity="detail");

Display the generated image caption.

display(captions)

captions = 
" The image shows six individuals standing in a row in what appears to be an office setting. The individuals are dressed in a variety of casual attire, including jeans, sweaters, and sweaters. The room has a neutral color scheme with beige walls and a dark gray or black carpet. The individuals are standing in front of a large painting or artwork depicting a serene landscape. The painting is framed in a dark brown or black frame. The individuals are standing in a relatively straight line, with their arms crossed, creating a sense of unity and camaraderie."

Input Arguments

collapse all

`mdModel` — Moondream vision-language model
`moondream` object

Moondream vision-language model, specified as a moondream object.

`I` — Input RGB image data
H-by-W-by-3 numeric array | H-by-W-by-3-by-B numeric array

Input RGB image data, specified as one of these options:

H-by-W-by-3 numeric array representing a single truecolor image.
H-by-W-by-3-by-B numeric array representing a batch of B truecolor images. B is the number of images in the batch.

`imds` — Datastore of images
datastore

Datastore of images, specified as any type of datastore that returns image data. If calling the datastore with the read function returns a cell array, then the image data must be in the first cell.

`verbosity` — Caption length
`"brief"` (default) | `"detail"`

Caption length, specified as one of these options:

"brief" — Returns a caption containing approximately 25 words or less.
"detail" — Returns a caption containing up to 60 words.

Output Arguments

collapse all

`captions` — Image captions
string scalar | 1-by-B string array

Image captions, returned as one of these options, depending on the format of the input image I.

I is a single RGB image — String scalar.
I is a batch of RGB images — 1-by-B string array, in which each element is the caption for the corresponding image from the batch. B is the number of images in the batch.

`captionsDS` — Datastore image captions
N-element string array

Datastore image captions, returned as an N-element string array. N is the number of images in the image datastore imds.

Tips

The quality of Moondream outputs can vary across different data domains. Validate its predictions using a data set from a domain similar to your intended application.

Version History

Introduced in R2026a

captionImage

Syntax

Description

Examples

Caption Image Using Moondream Vision-Language Model

Generate Descriptive Image Caption Using Moondream Vision-Language Model

Input Arguments

`mdModel` — Moondream vision-language model
`moondream` object

`I` — Input RGB image data
H-by-W-by-3 numeric array | H-by-W-by-3-by-B numeric array

`imds` — Datastore of images
datastore

`verbosity` — Caption length
`"brief"` (default) | `"detail"`

Output Arguments

`captions` — Image captions
string scalar | 1-by-B string array

`captionsDS` — Datastore image captions
N-element string array

Tips

Version History

See Also

Topics

captionImage

Syntax

Description

Examples

Caption Image Using Moondream Vision-Language Model

Generate Descriptive Image Caption Using Moondream Vision-Language Model

Input Arguments

mdModel — Moondream vision-language model moondream object

I — Input RGB image data H-by-W-by-3 numeric array | H-by-W-by-3-by-B numeric array

imds — Datastore of images datastore

verbosity — Caption length "brief" (default) | "detail"

Output Arguments

captions — Image captions string scalar | 1-by-B string array

captionsDS — Datastore image captions N-element string array

Tips

Version History

See Also

Topics

`mdModel` — Moondream vision-language model
`moondream` object

`I` — Input RGB image data
H-by-W-by-3 numeric array | H-by-W-by-3-by-B numeric array

`imds` — Datastore of images
datastore

`verbosity` — Caption length
`"brief"` (default) | `"detail"`

`captions` — Image captions
string scalar | 1-by-B string array

`captionsDS` — Datastore image captions
N-element string array