It definitely makes sense that MEX could be faster. When generating MEX code, Coder has access to the high-performance libraries used by MATLAB. In standalone code that isn't always the case. In this case, it may be that fft is slower. You'd need to measure to be sure. Extracting the FFTs being performed, generating MEX and EXE, and comparing the performance will show you this. You could also use a C profiler to profile the generated EXE to determine the bottleneck.
If FFT computation is dominating the execution time in EXE, Coder MEX and MATLAB use an optimized FFT library. You can link in the optimized FFTW with the standalone code for EXE,LIB,DLL starting in R2017b:
Provided that the FFTs are the bottleneck, using that should reclaim a fair bit of the performance.
The Coder doc also provides a number of optimization techniques:
Are you enabling C compiler optimizations when compiling your EXE? When compiling the EXE using Coder, set the config setting BuildConfiguration to 'Faster Runs'