There are too variable factors for there to be a "fastest" way for anything.
Generally speaking, using binary files and writing the read routine in C or C++ would be faster, but I would not want to exclude the possibility that loading a -v7 .mat file would be faster yet. (I would not expect a -v7.3 .mat file to be fastest.)