Reproducibility convolutional neural network training with gpu
显示 更早的评论
Hello,
I am training a CNN using my local GPU (to speed up training) for classification problems and would like to try different parameterizations. To avoid the variability effects due to different data and/or weights initialization I am resetting the random seeds each time before training:
% Initialize random seed (thus same dataset on same architecture would lead
% to predictable result)
rng(0);
%parallel.gpu.rng(0, 'CombRecursive');
randStream = parallel.gpu.RandStream('CombRecursive', 'Seed', 0);
parallel.gpu.RandStream.setGlobalStream(randStream);
% Train the CNN network
net = trainNetwork(TR.data,TR.reference,layers,options);
The problem is that when using GPU I am getting different results on each execution, even if initializing the GPU random seed to the same value. Strange thing is if I use CPU instead, then I do get the reproducible results. I am doing something wrong with GPU random seed initialization? Is there a know problem for this situation or something I am missing?
Thanks beforehand.
PS: I am using Matlab R2017b
采纳的回答
Use of the GPU has non-deterministic behaviour. You cannot guarantee identical results when training your network, because it depends on the whims of floating point precision and parallel computations of the form (a + b) + c ~= a + (b + c).
Most of our GPU algorithms are in fact deterministic but a few are not, for instance, backward convolution.
14 个评论
Very interesting and good to know! Thanks you
I am encountering the same issue and I am very surprised and I should say very disappointed by Mathworks: as a Matlab user since version 3.5, I cannot imagine that people developping software can accept their code not to be reproductible? It's a jok! Mathworks has to correct this bug or to propose a solution to customers: what about moving single precision GPU code in double precision as this is now available ? (and you claim it is coming from whims of floating point precision)
Can you let us know what non-deterministic behaviour it is that you're experiencing, specifically? As far as I'm aware deep learning training is the only place this happens, and that particular behaviour is true across all the deep learning frameworks because they use the same underlying NVIDIA library that has this behaviour. Maybe there is some randomness in your particular application that we're missing?
Hello,
@Joss Knight (or any other Matlab Staff Member), my colleague reffered to this Link and said that it is now possible to acchieve deterministic results in TensorFlow for Deep Learning algorithms on the GPU.
Is this something that Matlab will be / is able to implement in the near future?
Thanks,
Barry
I believe we have a plan to add support for deterministic training in a future release. As I say, as far as I know backward convolution and backward max-pooling are the only sources of indeterminism (other than certain kinds of parallel training) which means the problem is limited to training a deep network. If you know of other sources let me know.
@Joss Knight Repeatability and reporducibility are extremely important. How can someone even consider using MATLAB deep learning software for serious science if repeating the experiment yields slightly different results every time? I hope the plans to add deterministic behaviour to future releases happens sooner rather than later. It's unfortunate that this was not made a priority in the 2021 release
People use TensorFlow and pyTorch all the time for serious science and they have the exact same issue so I guess people don't consider it that bad a problem. You should only see this indeterminism during training which is typically initialized with random numbers anyway.
@Joss Knight - Has progress been made on fixing the issue? Lack of deterministic and repeateable training is proving to be quite a problem for some applications. For example, when I make a small change to the input data or the network, I want to know if differences in my results are due to the changes I have made and not the vagaries of non-deterministic floating point arithmetic. An update on this issue would be welcome, thanks.
Also, please note that you shouldn't be using the term "random rumbers" - but rather pseudorandom numbers, since they are generated by Matlab from a deterministic algorithm and not a stochastic process (like nuclear decay)
We are working on a solution and will let you know when it lands!
Joss Knight: I'm looking forward to seeing it soon. Please hurry
@Joss Knight, can you perhaps link some references that say that backward convolution and backward max pooling are non-deterministic?
@Joss Knight have you found a solution?
I am also facing the same problem
更多回答(0 个)
类别
在 帮助中心 和 File Exchange 中查找有关 Parallel and Cloud 的更多信息
另请参阅
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!选择网站
选择网站以获取翻译的可用内容,以及查看当地活动和优惠。根据您的位置,我们建议您选择:。
您也可以从以下列表中选择网站:
如何获得最佳网站性能
选择中国网站(中文或英文)以获得最佳网站性能。其他 MathWorks 国家/地区网站并未针对您所在位置的访问进行优化。
美洲
- América Latina (Español)
- Canada (English)
- United States (English)
欧洲
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)
