I am not sure but the below is a demo of video classification which may relate to your study.
The image features were extracted via a pre-trained network and the time-series features were classified using LSTM (Long Short Term Memory).
From your title, I suspect that you would like to do video classification. I hope the demo above helps you.
