By default, Parallel Computing Toolbox uses as many workers as you have real cores on your system. The upper limit of 512 workers is intended to be "effectively unlimited" for all practical purposes on current systems. (I don't quite understand what's being "wasted" here.)
Parallel Computing Toolbox does not support Xenon Phi.