I hesitate to provide a simple answer to this, but here is goes anyway.
Almost all Simulink library blocks are either implemented as built-in blocks or S-functions in either C/C++ - you might say that these run as fast as possible. However, both Embedded MATLAB blocks and Stateflow also generate a C-based S-function from user-defined code right before simulation starts. So there is some overhead at the start of simulation, but in the simplest sense, they all should have comparable performance during simulation.
Of course there are other tradeoffs - Stateflow and Simulink are good at solving different kinds of problems. Embedded MATLAB is great if you are more comfortable with programming in MATLAB and can limit yourself to the set of supported functions (note that calling unsupported functions using eml.extrinsic will result in these calls being dispatched to the MATLAB interpreter, and you lose performance).