Simple Linear SVM classification with normalization

Question

1 个投票

I'm working on an image classification project using a linear SVM. Currently I have 10 feature vectors and I'm getting a decent level of accuracy.

To increase the accuracy I wish to add a 'question' feature vector. This is simply a 'yes' or 'no' question (based on the image) which will be asked from the user, both during training and testing to increase the accuracy. Now this new feature vector has to be added to the data matrix and it's a little tricky. Following some StackOverflow questions, here's what I did.

Normalize the existing dataset, column wise
Normalization range is -1 and 1.
The new question feature vector will be a 1 (yes) or -1 (no) depending on the users answer
Unanswered questions can be saved as 0 (not sure about the validity, so not tested this yet)

Now when a user passes in a test image for classification, it's feature vectors are not normalized. Meaning the feature vectors are there as is, and the new question vectors are added as '-1' or '1' according to the users answer. From my knowledge, test image feature vectors can't be normalized, can they? You need a dataset to normalize, and test images have only 1 row of data.

Anyways, I run the classifier and all my images are getting classified into the same group. EVEN the images that I used to train the other group. Surely those must be classified into the 2nd group right?

I run another training session, with different data (20 images) and answering each question with the image carefully. This time all my test images are getting sorted into the second group. Including the images I used to train the first group.

Any help is greatly appreciated to help me understand what's happening.

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

Manu BN 2016-4-20

Perform feature scaling and mean normalization i.e Normalized_Feat = (originalFeat-mean2(originalFeat(:)))/std2(originalFeat(:)). All the range of values will be between 1 and -1. If you are doing scaling for train set you gotta do it for the test set also. If everything is falling in the same category, even for the training set means your feature set is not giving a succinct representation for the different classes considered (High Bias situation). So use a different feature set.

请先登录，再进行评论。

请先登录，再回答此问题。

Follow Question

Answer 1

Walter Roberson 2016-3-20

2 个投票

You do not need a database to normalize. You normalize according to the same calculation you used for the training images.

If your normalization calculation for your training images determined that you should subtract 518.3491 and then divide by 83175.2993 to normalize, then you should normalize your test images by subtracting 518.3491 and then dividing by 83175.2993 .

If your test images happen to take on values beyond the range used for the training images, then you allow the values to be outside the +/- 1 range if that is what your normalization calculation implies. If that causes your code to fall over, then that should strongly imply to you that you used the wrong normalization procedure. If your potential data has hard limits then you normalize against the hard limits; if your training data should be considered a statistically representative sample of the problem, then your normalization procedure should be taking into account that any unconstrained distribution has a finite probability of there being a sample that "truly" belongs, even at any arbitrary large "distance" from the mean, so if your data is only "typically" -32.8 to +49.3, then your normalization procedure had better not fall over the time that the sample comes in at 83175.2993 .

3 个评论
显示 1更早的评论隐藏 1更早的评论

Walter Roberson 2016-3-20

What is the current normalization formula?

For a column whose inputs are known to be restricted to -1, +1 and 0, there is probably no need to do normalization at all. But that depends which algorithms you are using: some NN processing assumes the data is in the range -1 to +1, and other NN processing assumes the data is in the range 0 to +1.

John Daber 2016-3-20

在 MATLAB Online 中打开

The current normalization formula first normalizes to 0 to +1 and then to -1 +1.

value=(value-minValInColumn) / (maxValInCol-minValInCol)
value=(value*2)-1

So you're saying I need to continue to normalize the other columns but not normalize the question column because it's values are always either -1, 0 or +1. I will try this and update this thread on any changes in accuracy.

I'm not using any neural networks at all, simply a linear SVM as mentioned before. Thanks.

请先登录，再进行评论。

Answer 2

Greg Heath 2016-4-21

在 MATLAB Online 中打开

0 个投票

The "I"nput matrix of size [ I N ] = size(x) consists of N I-dimensioal input vectors that contain ALL (includes the question vector) of the information needed for classification.

The "O"utput target matix of size [ O N ] = [ c N ] = size(t) consists of N O = c-dimensional {0 1} "c"lasification unit vectors from the unit matrix eye(c).

The relationship between the target matrix and the true class indices {1:c} is

 target          = full(ind2vec(trueclassindices))
 trueclassindices = vec2ind(target)

Use ZSCORE to standardize the inputs to have zero mean and unit standard deviation. Delete or modify outliers. These are input to the classifier which has a default input normalization to [-1 1]. Do not remove this second normalization because the extreme values are used to determine the values of the random initial input weight vectors.

After training: the output, corresponding estimated class indices and errors are given by

y = net(x); 
 estimatedclassindices = vec2ind(y)
 errors = (estimatedclassindices ~= trueclassindices)

For exmples seearch BOTH the NEWSGROUP and ANSWERS with

greg PATTERNNET.

Hope this helps.

Thank you for formally accepting my answer

Greg

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

Simple Linear SVM classification with normalization

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

回答（2 个）

3 个评论
显示 1更早的评论隐藏 1更早的评论

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

类别

产品

标签

Community Treasure Hunt

Simple Linear SVM classification with normalization

1 个评论 显示 -1更早的评论 隐藏 -1更早的评论

回答（2 个）

3 个评论 显示 1更早的评论 隐藏 1更早的评论

0 个评论 显示 -2更早的评论 隐藏 -2更早的评论

类别

产品

标签

另请参阅

Community Treasure Hunt

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

3 个评论
显示 1更早的评论隐藏 1更早的评论

0 个评论
显示 -2更早的评论隐藏 -2更早的评论