How to select the number of samples to train a Machine Learning algorithm?

2 次查看(过去 30 天)
I working in a dataset of 12000 samples concerning about 5 years of an industrial process.
It is likely that during this time the plant has undergone changes (equipments, the performance drop itself, chemical products).
Is there a tool for identifying the best subset of this data? In my view, a temporal cut in the data could increase the quality of the models created.
  3 个评论
Jose Marques
Jose Marques 2019-1-31
Thanks for the comment!
The dataset has 426 inputs (I am using techniques for feature selection too).
I am using four algorithms to create the models: Regression Tree, Bagged Trees, SVM and Neural Networks.
Greg Heath
Greg Heath 2019-2-4
As a common sense rule of thumb I try to use at least 10 to 30 times as many training points as unknown parameters that have to be estimated.
In addition I use 10 to 20 sets of random initial weights.
I assume , of course, that you ave examined plots of the data to initialize your common sense.
Hope this Helps
Greg

请先登录,再进行评论。

回答(1 个)

BERGHOUT Tarek
BERGHOUT Tarek 2019-2-3
u can use deep belif networks ; they are the best for feature sellection and mapping; and train you network by driven chunks of data "by randomly chosing a pairs of (inputs,targets)" and in the same time pire attention to your approximation function you must keep your error function in its local minimam. deep belif nets depands on a set of stacked auto_encoders that allows to tune all the parameters of the networks with small amount of training data

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by