There may be nothing you can do. Data transfer times to an external GPU are much higher. It might help to increase the MiniBatchSize training option, or enable DispatchInBackground.
Of course, it's plausible that your network is very small and not using the GPU efficiently, so all you're seeing is the same overhead and the device makes no difference. What is the input size to your network?