Binary classification

Question

1 个投票

Hey all!

My question: Is it possible to use classification methods to determine if an unknown sample fits the distribution of known samples?

I have a known dataset that constitutes an object parameters distribution (various circles with various proprieties as circularity, area, perimeter, solidity, etc.). Rows are independent samples, and columns are each parameters. The problem is that I need the function to determine if a new sample is a circle or not. From what I saw in classification, you need to specify every class, there is no "everything else" class. What should be the best way to find if the new object is a circle or not (here circle is really just an example) and have an error or confidence measurements on the decision?

Regards,

Olivier

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

请先登录，再回答此问题。

Follow Question

Answer 1

Ilya 2012-5-24

1 个投票

You might want to start here http://en.wikipedia.org/wiki/One-class_classification The 1st reference (PhD thesis) gives an overview of methods.

There are no utilities in the official MATLAB release you could use right away, but it would be fairly easy to code some of the reviewed methods. For example, in the ascending order by complexity:

Assume that predictors (columns) are uncorrelated and compute the distance between a new sample (row) and the mean of the training set (set of known samples). Compare with the reference distribution obtained by taking the distance between every row in the training set and the mean of all other rows.
Assume that the known samples come from a Gaussian mixture of distributions. Find this mixture using gmdistribution from Statistics Toolbox. Compute Mahalanobis distance between the new sample and every Gaussian component. Estimate the probability assuming chisq distribution for the squared Mahalanobis distance.
Find k nearest neighbors for every sample in the training set using knnsearch. Compute the distribution of the average distance between every sample and its k nearest neighbors. Find k nearest neighbors in the training set for the new sample and take the average of their distance values. Compare to the reference distribution.

And so on. If your training set is pure (all objects are indeed circles) and if your data are low-dimensional, you really have plenty of methods at your disposal. Without purity or in high dimensions, the problem can become substantially harder.

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

Olivier Dupont 2012-5-25

I'll dig into that article! Also I might try to define what "is not" a circle in order to use a more classical approach and distinguish between two states. By finding circles in an image, I also find everything else. So I may be able to use that everything else to define the second class.

请先登录，再进行评论。

Answer 2

Walter Roberson 2012-5-24

0 个投票

It might be possible with some classifiers, but not for most.

Some classifiers just divide the area into two planes or two hyperplanes, and define the class according to which side of the hyperplane one is on.

Other classifiers provide a probability of belonging to a particular class, but those probabilities are never 0. You could, naturally, arbitrarily say that a sample is not in either class if the probability of belonging is "small enough" for both of the classes.

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

Binary classification

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

采纳的回答

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

更多回答（1 个）

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

类别

标签

Community Treasure Hunt

Binary classification

0 个评论 显示 -2更早的评论 隐藏 -2更早的评论

采纳的回答

1 个评论 显示 -1更早的评论 隐藏 -1更早的评论

更多回答（1 个）

0 个评论 显示 -2更早的评论 隐藏 -2更早的评论

类别

标签

另请参阅

Community Treasure Hunt

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

0 个评论
显示 -2更早的评论隐藏 -2更早的评论