Given a value for the number of hidden nodes, using different random weight initializations AND random weight divisions will yield a spread of results. The difference you state is typical.
To keep things manageable, I typically, do not train more than 100 nets at a time: , numH = numel(Hmin:dH:Hmax) = 10 and Ntrials = 10 for each H value. I display the 100 NMSE or Rsq =1-NMSE results in a Ntrials x numH matrix. Then I display the min, median, mean, std and max of Rsq in a 5 x numH matrix.
You would be surprised how disparate some results can be.
Searching the NEWSGROUP and ANSWERS using
greg Ntrials
should bring up enough examples.
Hope this helps.
Thank you for formally accepting my answer
Greg