Matrix multiplication (GEMM) is one of the heavily optimized methods and when operating on larger inputs more optimizations, blocking and cache reuse can be achieved.
The two extremes of this are a BLAS level 2 way where you multiply each column (GEMV - matrix vector multiply) versus the method of BLAS level 3 GEMM (matrix matrix multiply).
A naive GEMM (using 3 for loops) usually gets around 3-5% of the processors peak performance. A blocked GEMM without any other optimization (6 for loops) gets around 20% of the peak performance. The matrix multiply MATLAB uses is Intel MKL's GEMM which is tuned for different processors and can get around 80-90% of the processor's performance.
Now all those numbers above are for large matrix sizes as cache reuse and SIMD need larger sizes to overcome overheads. This is the reason for the differences you are seeing as the larger the matrices get the more optimizations MKL is able to get out of the input.