I found the major reason, because I used the trainfcn "trainlm", which is not supported by GPU and when use GPU matlab will change it automatically to "trainscg".
But then it comes the next question: when I use " trainscg" for bothe CPU and GPU, the Performances are different:
trainedNet=train(net,X_',Y','useGPU','only‘) --> 5000Itr,Performance 192, 232s
trainedNet=train(net,X_',Y‘); --> 5000Itr, Performance 154, 186s
What is the reason for this?