Input shape for image sequence classification
6 次查看(过去 30 天)
显示 更早的评论
The following link has a sample code for classifying a sequence of images. The networks is built to classify a sequence of 28 by 28 grayscale images. It was not clear fot me the shape of the input data. How does the network "understand" that the sequence in this case consists of one image? What is the shape of the matrix or object holding all images? is it [28 28 1 1 1000] for [height, width, channels, time, number of sequences]?
0 个评论
回答(1 个)
Tarunbir Gambhir
2021-6-17
In this example, since the task is to create a classification LSTM network that classifies sequences of 28-by-28 grayscale images, the 'sequenceInputLayer' function takes the input size as [28 28 1] for height, width, and number of channels.
The 'sequenceInputLayer' then takes care of interpreting the sequence input of the form H-by-W-by-C-by-S array, where H, W, C, and S are the height, width, number of channels, and number of frames of the video, respectively.
You can refer this example for the full code to classify sequence of RGB images. You can see that the input to the model is read using the helper function 'readVideo' which returns an H-by-W-by-C-by-S array, and the input size to the 'sequenceInputLayer' is given as [inputSize 3] where inputSize is [224 224].
2 个评论
Xie Shipley
2023-10-24
@HA have you tried image sequence classification task on GPU, I got CUDNN_STATUS_EXECUTION_FAILED ERROR when convolution2dLayer is used, could you help me figure this out?
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 Image Data Workflows 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!