NFTOOL: fitting LARGE amounts of data

Question

fkouk 2020-11-17

0
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/651238-nftool-fitting-large-amounts-of-data

评论： fkouk 2020-11-29

Test.PNG

Hello everyone,

I have been exploring the Neural Network capabilities of Matlab concerning fitting/interpolation, using the nftool.

So far I have been using a vector of:

[N x 2] as an input
[N x 1] as an output

where N is ~40000. Training is rather fast (say within 10 minutes) and with hiddenLayerSize of [4,10,20] I get very good agreement when testing the network; effectively the R parameter is unity (see test.png).

Training parameters are:

net.divideParam.valRatio = 5/100;

net.divideParam.testRatio = 5/100;

net.divideParam.trainRatio = 1 - net.divideParam.valRatio - net.divideParam.testRatio;

using Bayesian regularisation.

Now, I need to repeat the same process with an input vector of [N x 3] and output again [N x 1], but now N is in the order of 4*10^6 (there is the possibility to expand even beyond that, with N x 6 input and even larger Ns). It is still possible to get a good training by increasing No of layers (I have obtained good performance with hiddenLayerSize = [20, 5, 10, 20];), though it is much more time consuming (takes around a day to train) and whereas the match is good (again R~1), there are clearly some points that do not get captured as shown in the regression plot and there is a bit of noise when the trained network is used; this I dont think is from over-fitting, as performance tends to improve with increasing layer sizes.

Is there any way to improve the performance of training? Can deep learning networks or convolutional networks help here? It is a bit unclear if it is possible to use them for fitting applications, as they are shown in Matlab examples..

Any idea is welcome

Thanks

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

请先登录，再回答此问题。

Answer 1

Srivardhan Gadila 2020-11-27

0
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/651238-nftool-fitting-large-amounts-of-data#answer_557298

The following are some suggestions based on my knowledge:

In general the dataset is splitted as follows: 70% for training, 15% for validation & 15% for testing. Also I think that shallow neural networks are good enough for this problem, even though you can try regression using deep neural networks. Also refer to the following documentation pages Improve Shallow Neural Network Generalization and Avoid Overfitting, Divide Data for Optimal Neural Network Training, Workflow for Neural Network Design for shallow neural networks and Deep Learning Tips and Tricks, Deep Learning Tuning and Visualization in case of Deep neural networks.

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

fkouk 2020-11-29

Hello,

Thanks for the answer. I am aware of different ways of splitting data for training/testing/validation and I will try in the future to test other ways, in an effeort to improve performance.

I will also go through the references in detail.. the one with the convolutional networks is really interesting, though I am unsure if it can be applied in my case. I mean in the example, CNN is applied in an image (so the input vector is an array of several pixels by several pixels) and the output is a scalar; I can understand how convolution is applied there, as parts of the image are "grouped".. in my case where the input is just a 2d or 3D vector, I am unsure if convolution makes any sense..

Anyway, thanks for the references.

请先登录，再进行评论。

NFTOOL: fitting LARGE amounts of data

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

回答（1 个）

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

另请参阅

类别

标签

Community Treasure Hunt

NFTOOL: fitting LARGE amounts of data

0 个评论 显示 -2更早的评论隐藏 -2更早的评论

回答（1 个）

1 个评论 显示 -1更早的评论隐藏 -1更早的评论

另请参阅

类别

标签

Community Treasure Hunt

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

1 个评论
显示 -1更早的评论隐藏 -1更早的评论