Machine learning and data normalization - how data should(?) be normalized.
6 次查看(过去 30 天)
显示 更早的评论
Hello, I have a general question about data normalization for classification algorithms: if I have a training set and a testing set, should I normalize them separately or join them for normalization step? And what if later I would like to use this classifier to classify a totally new portion of data? Should I keep extreme values of each feature to use them for normalization?
Second question I have: Is normalization really necessary? Does SVM need it?
Thank you in advance for any help. Cheers, Michael
0 个评论
回答(2 个)
Mostafa Nakhaei
2019-10-18
Please note that the best practice in machine learning is to keep the distribution of testing and training the same. So, if you want to normalize your data, it is good to do the normalization on whole dataset first and then separate them. thus, your testing and training will have the same distribution. The common error is to separate the data and then normalize them individually.
0 个评论
BERGHOUT Tarek
2019-2-3
1-you can normalize the eparately or together but the best way is to normalize the inside the trainig function ; if you add the normelization function inside the trainig function , you can use it for any dataset after that .
2- yes normalization alwaze necesery if and ownly if the activation fuinctions of your training model are bounded otherwise you don't have to normelize tham;
and for SVM if the kerenel function is bounded you must normelize you data.
0 个评论
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 Statistics and Machine Learning Toolbox 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!