You velocity logic is incorrect.
You build up a list of centroids. There can be more than one centroid in any image, and you concatenate the centroid lists between all of the images. There can be a different number of centroids detected for each image, and because of the way the label algorithm works, potentially the order of objects could switch a little between images.
But your velocity calculation is
(positions(i + 1, 2) - positions(i, 2)) / dt
This is comparing different objects within the same image (mostly), but at boundaries between images it is comparing the last object detected in one image to the first object detected in another image.
You should be instead approximately grouping objects based on x coordinates, and calculating velocities only within the groups.
You should not be grouping based on exact equality of x coordinates of centroids: as objects fall, the exact match between ideal centroid and discretized centroid is likely to vary a little. You should group based on ismembertol() or the like.
When grouping, you need to potentially introduce placeholders for objects that are missing in some images, since they might show up again in later images. You cannot just smash all related objects together in a single list because your velocity estimate depends upon time, and an object that disappears on the 4th frame and reappears on the 6th frame needs to be estimated with (dt*3) instead of (dt)
You have unanswered questions about what to do if two different objects happen to have (approximately) the same X centroid coordinates.