Binary classification

6 次查看(过去 30 天)
Olivier Dupont
Olivier Dupont 2012-5-24
Hey all!
My question: Is it possible to use classification methods to determine if an unknown sample fits the distribution of known samples?
I have a known dataset that constitutes an object parameters distribution (various circles with various proprieties as circularity, area, perimeter, solidity, etc.). Rows are independent samples, and columns are each parameters. The problem is that I need the function to determine if a new sample is a circle or not. From what I saw in classification, you need to specify every class, there is no "everything else" class. What should be the best way to find if the new object is a circle or not (here circle is really just an example) and have an error or confidence measurements on the decision?
Regards,
Olivier

采纳的回答

Ilya
Ilya 2012-5-24
You might want to start here http://en.wikipedia.org/wiki/One-class_classification The 1st reference (PhD thesis) gives an overview of methods.
There are no utilities in the official MATLAB release you could use right away, but it would be fairly easy to code some of the reviewed methods. For example, in the ascending order by complexity:
  • Assume that predictors (columns) are uncorrelated and compute the distance between a new sample (row) and the mean of the training set (set of known samples). Compare with the reference distribution obtained by taking the distance between every row in the training set and the mean of all other rows.
  • Assume that the known samples come from a Gaussian mixture of distributions. Find this mixture using gmdistribution from Statistics Toolbox. Compute Mahalanobis distance between the new sample and every Gaussian component. Estimate the probability assuming chisq distribution for the squared Mahalanobis distance.
  • Find k nearest neighbors for every sample in the training set using knnsearch. Compute the distribution of the average distance between every sample and its k nearest neighbors. Find k nearest neighbors in the training set for the new sample and take the average of their distance values. Compare to the reference distribution.
And so on. If your training set is pure (all objects are indeed circles) and if your data are low-dimensional, you really have plenty of methods at your disposal. Without purity or in high dimensions, the problem can become substantially harder.
  1 个评论
Olivier Dupont
Olivier Dupont 2012-5-25
I'll dig into that article! Also I might try to define what "is not" a circle in order to use a more classical approach and distinguish between two states. By finding circles in an image, I also find everything else. So I may be able to use that everything else to define the second class.

请先登录,再进行评论。

更多回答(1 个)

Walter Roberson
Walter Roberson 2012-5-24
It might be possible with some classifiers, but not for most.
Some classifiers just divide the area into two planes or two hyperplanes, and define the class according to which side of the hyperplane one is on.
Other classifiers provide a probability of belonging to a particular class, but those probabilities are never 0. You could, naturally, arbitrarily say that a sample is not in either class if the probability of belonging is "small enough" for both of the classes.

类别

Help CenterFile Exchange 中查找有关 Statistics and Machine Learning Toolbox 的更多信息

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by