Since the animal run in maze.
And i have the proper knowledge of orientation of maze frame.
I can simply add a camera above the maze framework.
The camera will give the frames for video.
1. I am assuming since its ur dog, u have a photo of dog.
Now, in each frame, u can simply has to detect that body photo in frame and make a boundary using any unsupervised learning.
2. We can use each frame and see the temporal difference of changes done in maze. And since we have knowledge of maze we can find the movements.
Simply there is no need of much deep learning in this case.
Just simple CV tools and logics will handle it.