Questions on classification learner App

Question

Ioannis 2017-5-21

1
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/341211-questions-on-classification-learner-app

回答： Nagarjuna Manchineni 2017-5-26

Hello everyone.

I have started using the classification learner app and I have some questions I would like to ask. I will use Matlab's ovarian cancer data-set as an example to illustrate my issues.

1) In the case where we might be missing the response class for an observation (e.g. if response type was coming from histology and histology was not performed for the specific observation, but the predictors'data is available), is it preferable to set the missing observation's response to another, extra, class (e.g. 'unknown') or is it better not to use the observation at all?

2) When enabling PCA to reduce the dimensionality of the observations (in the ovarian cancer data-set, PCA reduces the number of predictors from 4000 to 215 and is using 7/215 features), can we know which features (obs in the ovarian cancer data-set) are the ones that PCA has kept?

3) When exporting a trained model to make predictions for new data and PCA was used dung training, what extra arguments do we need to use when calling: newPredictions = myExportedModel.predictFcn(newData) to ensure that the function knows that PCA was used during training myExportedModel?

Many thanks in advance for your help!

Regards, Ioannis

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

请先登录，再回答此问题。

Answer 1

Nagarjuna Manchineni 2017-5-26

2
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/341211-questions-on-classification-learner-app#answer_268614

在 MATLAB Online 中打开

1. It depends on the classifier and the data/application you are using. For example, if you are trying to solve your classification problem using a linear classifier that predicts whether cancer is there or not? In this case making a third category (unknown) is not going to help. Whereas if you are trying to group all the data into clusters () then making them as "NAN" or 'Unknown' helps you.

2. Principal component analysis is a quantitatively rigorous method for achieving this simplification. The method generates a new set of variables, called principal components. Each principal component is a linear combination of the original variables. All the principal components are orthogonal to each other, so there is no redundant information. The principal components as a whole form an orthogonal basis for the space of the data. For example, in the cancer dataset, if you are using x predictors and then MATLAB PCA reduces this to y (<=x). These are not the actual data (columns) which you are using, these are derived columns out of the predictors by MATLAB. If you want to see the data of these 7 components out of the trained classifier, then you can use the following command

>> trainedClassifier.PCACoefficients

Also, for seeing how to use the trained classifier, use the following command, this command will give the whole description on how this particular model should be used and how to predict the response variable from the input data

>> trainedClassifier.HowToPredict

3. MATLAB trained model will know whether PCA is used or not, so it will handle the conversions, you just need to pass the observation which you want to test. However, if you want to ensure that if the trained classifier used PCA before then, you can use the above suggested 'HowToPredict' function.

See the following documentation link that explains about PCA:

https://www.mathworks.com/help/stats/principal-component-analysis-pca.html

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

Questions on classification learner App

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

采纳的回答

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

更多回答（0 个）

另请参阅

类别

标签

Community Treasure Hunt

Questions on classification learner App

0 个评论 显示 -2更早的评论隐藏 -2更早的评论

采纳的回答

0 个评论 显示 -2更早的评论隐藏 -2更早的评论

更多回答（0 个）

另请参阅

类别

标签

Community Treasure Hunt

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

0 个评论
显示 -2更早的评论隐藏 -2更早的评论