Out-Sample normalization problem
1 次查看(过去 30 天)
显示 更早的评论
Hi. I’m working on a binary classification system that I have 21 financial ratios and variables for inputs and my output is one of financial criteria that could be 0 or 1. Before insert data to my classification model (MLP, SVM or ELM) I normalize data (max/min mapping or whitening). My financial ratios are from companies’ statements so we have various size of companies in our data.
Otherwise I'm using 5-fold cross validation for designing my model. After design the model now I want use it by new data so I must normalize these data. I find that for Max-Min mapping I must use Maximum and Minimum of designing phase data-set and for whitening I must use mean and variance of it.
Suppose that in x-min/max-min, my new data set has a feature sample that x of it is lower than previous minimum so now this normalized feature (for that specific sample) is negative. This is not a problem? Is the output (1 or 0) true for this specific sample? Besides this in whittling method we can have same problem.
Thanks.
0 个评论
采纳的回答
Greg Heath
2014-4-3
编辑:Greg Heath
2014-4-3
Regardless of what you use in the model, I always standardize pre-modelling using zscore or mapstd to identify outliers for removal or modification.
Warning: Each dimension should be normalized separately.
P.S. If you use neural nets the default is mapminmax to [-1,1] and the hidden layer transfer functions are the odd function tanh.
Hope this helps
Thank you for formally accepting my answer
Greg
6 个评论
Image Analyst
2014-4-7
Jack's second so-called "Answer" moved here:
Thank you again Greg.
I don’t use k-means clustering after employ other outlier detection techniques. Outlier detection using k-means clustering is an option for outlier detection in my system besides your proposed technique. So I can choose any of these two techniques. With regard to the above discussion, what is your idea about k-means clustering?
You mentioned that I can use ‘(x-meanx)/std > threshold of your choice ‘so your proposed technique does not consider all inputs (in my case: 21 variables) simultaneously and I can analyze one feature with it at a time. Is this true?
Thanks.
Greg Heath
2014-4-11
No. You consider all at once using matrix coding. I consider a 21 dimensional vector an outlier if one or more components is an outlier.
All MATLAB code is matrix based. So if you find one or more outlying components in a column of an input or target matrix, either modify or delete the column. Any target column corresponding to a deleted input must also be deleted and vice versa.
更多回答(0 个)
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 Discriminant Analysis 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!