How to interpret Anomaly Scores for One Class Support Vector Machines

Question

NCA 2024-8-25

0
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/2147829-how-to-interpret-anomaly-scores-for-one-class-support-vector-machines

评论： Umar 2024-8-27

FOR MATLAB ANSWERS.png

I am using One Class Support Vector Machines for anomaly detection. Here is the anomaly scores histogram (attached) for the model trained with 274 samples and tested with 31 samples. How do I determine the true/false prediction rates from the anomaly scores histogram.

Thank You

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

请先登录，再回答此问题。

Answer 1

Kaustab Pal 2024-8-26

0
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/2147829-how-to-interpret-anomaly-scores-for-one-class-support-vector-machines#answer_1505204

在 MATLAB Online 中打开

Hi @NCA,

To determine true and false prediction rates, it's crucial to set an appropriate threshold on the anomaly scores. Samples with scores above this threshold are classified as anomalies, while those below are considered normal. By examining the distribution of anomaly scores in the histogram, you can identify natural separations or clusters that suggest a reasonable threshold.

Once the threshold is set, you can calculate the following metrics:

True Positives (TP): The number of actual anomalies correctly identified.
False Positives (FP): The number of normal samples incorrectly classified as anomalies.
True Negatives (TN): The number of normal samples correctly identified.
False Negatives (FN): The number of actual anomalies incorrectly classified as normal.

Using these values, you can compute various performance metrics to evaluate your model:

Please find below a short code snippet:

% Sample anomaly scores and ground truth labels
anomalyScores = [0.1, 0.4, 0.35, 0.8, 0.7, 0.6, 0.2, 0.9, 0.3, 0.5];
groundTruth = [0, 0, 0, 1, 1, 1, 0, 1, 0, 1]; % 1 for anomaly, 0 for normal
% Set an appropriate threshold
threshold = 0.5;
% Initialize counters
TP = 0;
FP = 0;
TN = 0;
FN = 0;
% Evaluate predictions based on the threshold
for i = 1:length(anomalyScores)
    if anomalyScores(i) > threshold
        if groundTruth(i) == 1
            TP = TP + 1; % True Positive
        else
            FP = FP + 1; % False Positive
        end
    else
        if groundTruth(i) == 0
            TN = TN + 1; % True Negative
        else
            FN = FN + 1; % False Negative
        end
    end
end
% Calculate metrics
precision = TP / (TP + FP);
recall = TP / (TP + FN);
f1Score = 2 * (precision * recall) / (precision + recall);
accuracy = (TP + TN) / (TP + FP + TN + FN);
% Display results
fprintf('Precision: %.2f\n', precision);
fprintf('Recall: %.2f\n', recall);
fprintf('F1 Score: %.2f\n', f1Score);
fprintf('Accuracy: %.2f\n', accuracy);

I hope this answers your query!

2 个评论
显示无隐藏无

NCA 2024-8-26

FOR MATLAB ANSWERS.png

Thanks Kaustab for the detailed explanation. I would like to know why you set the number 1 as the anomaly and 0 as the normal, can I do the reverse as I was following OCSVM from MATLAB where it asigned the negative scores as anomalies and postive scores as normal based on the threshold of 0.Please see the attached Anomaly Histogram with the title "FOR MATLAB ANSWERS".

Secondly I am assuming you created "groundTruth" so in my case I need to create a file termed "groundTruth" with a value of 0 or 1 against each of my "anomalyScores" for my 31 test samples?

Thanks

Umar 2024-8-27

Hi @NCA,

In anomaly detection, the classification of samples as anomalies or normal is contingent upon the threshold set on the anomaly scores. In your case, you can indeed reverse the labeling of anomalies and normal samples; the key is consistency in your approach. If your model, such as OCSVM, designates negative scores as anomalies, you should adjust your ground truth accordingly. Regarding the creation of the groundTruth variable, it is essential to have a corresponding label for each anomaly score. For your 31 test samples, you should create a binary array where each entry reflects whether the sample is an anomaly (1) or normal (0). This will enable you to accurately compute metrics like True Positives, False Positives, and others, ensuring a robust evaluation of your model's performance.Here’s a brief code snippet to illustrate how you might set up your groundTruth:

% Example ground truth for 31 samples
groundTruth = [0, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 0, 0, 1, 0,
1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0];

The code snippet above will allow you to effectively evaluate your anomaly detection model. Hope this helps clarify your question, “Secondly I am assuming you created "groundTruth" so in my case I need to create a file termed "groundTruth" with a value of 0 or 1 against each of my "anomalyScores" for my 31 test samples?” Please let us know if you have any further questions.

请先登录，再进行评论。

Answer 2

Umar 2024-8-26

0
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/2147829-how-to-interpret-anomaly-scores-for-one-class-support-vector-machines#answer_1504849

Hi @NCA,

To address your query regarding, “I am using One Class Support Vector Machines for anomaly detection. Here is the anomaly scores histogram (attached) for the model trained with 274 samples and tested with 31 samples. How do I determine the true/false prediction rates from the anomaly scores histogram. “

Please see my response to your comments below.

First, I generated synthetic data, as you can see in the code, rng(1) command sets the random number generator seed to 1, by making sure that the results can be reproduced, trainData generates 274 samples from a standard normal distribution (mean = 0, variance = 1) for training and testData creates a test dataset consisting of 31 normal samples and 10 anomalies (shifted by 5 units on the x-axis).

rng(1); % For reproducibility
trainData = randn(274, 2); % 274 samples for training
testData = [randn(31, 2); randn(10, 2) + 5]; % 31 normal samples and 10 
anomalies

Then, created labels for training data which creates a label vector for the training data, where all entries are set to 1, indicating that all training samples are considered normal.

trainLabels = ones(size(trainData, 1), 1);

Now, training one class SVM is implemented in which fitcsvm function trains a One-Class SVM model using the training data and labels, KernelFunction', 'gaussian’ specifying the use of a Gaussian kernel for the SVM, ’Standardize', true normalizes the data to have zero mean and unit variance and ‘ClassNames', [1; -1] defines the class labels for the model.

ocsvmModel = fitcsvm(trainData, trainLabels, 'KernelFunction', 'gaussian', 
'Standardize', true, 'ClassNames', [1; -1]);

Afterwards, predicting anomaly scores for test data which uses the trained SVM model to predict labels and scores for the test data. The score variable contains the anomaly scores, which indicate how likely each sample is to be an anomaly.

   [predictedLabels, score] = predict(ocsvmModel, testData);

Then, created subplots first histograms in which a figure with two subplots is created. The first subplot displays a histogram of the anomaly scores for the test data while the second subplot shows the histogram of the anomaly scores for the training data.

figure;

% Subplot for test data
subplot(2, 1, 1);
histogram(score(:, 2), 30, 'FaceColor', 'b', 'FaceAlpha', 0.5);
title('Anomaly Scores Histogram - Test Data');
xlabel('Anomaly Score');
ylabel('Frequency');

% Subplot for training data
subplot(2, 1, 2);
trainScores = predict(ocsvmModel, trainData);
trainAnomalyScores = trainScores(:, 1); % Get anomaly scores for training data
histogram(trainAnomalyScores, 30, 'FaceColor', 'r', 'FaceAlpha', 0.5);
title('Anomaly Scores Histogram - Training Data');
xlabel('Anomaly Score');
ylabel('Frequency');

Afterwards, determining true/false prediction rates in which a threshold of 0 is set to classify scores as anomalies. Scores greater than this threshold are considered anomalies. Also, the trueLabels vector is created to represent the actual labels of the test data.

   threshold = 0; % Set threshold for anomaly detection
    predictions = score(:, 2) > threshold; % True if score indicates anomaly

    % True labels: 1 for normal, -1 for anomaly
    trueLabels = [ones(31, 1); -ones(10, 1)];

Then, I implemented code to calculate true positive, false positive, true negative and false negative based on the predictions and true labels.

TP = sum(predictions(trueLabels == -1)); % True Positives
FP = sum(predictions(trueLabels == 1));  % False Positives
TN = sum(~predictions(trueLabels == 1)); % True Negatives
FN = sum(~predictions(trueLabels == -1));% False Negatives

The true positive rate (sensitivity) and false positive rate are calculated to evaluate the model's performance.

truePositiveRate = TP / (TP + FN);
falsePositiveRate = FP / (FP + TN);

Finally, the true positive and false positive rates are printed to the console, providing insight into the model's effectiveness in detecting anomalies.

fprintf('True Positive Rate: %.2f\n', truePositiveRate);
fprintf('False Positive Rate: %.2f\n', falsePositiveRate);

Please see attached.

Please let me know if this helped resolve your problem. Please let me know if you have any further questions.

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

How to interpret Anomaly Scores for One Class Support Vector Machines

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

采纳的回答

2 个评论
显示无隐藏无

更多回答（1 个）

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

另请参阅

类别

标签

产品

版本

Community Treasure Hunt

How to interpret Anomaly Scores for One Class Support Vector Machines

0 个评论 显示 -2更早的评论隐藏 -2更早的评论

采纳的回答

2 个评论 显示 无隐藏 无

更多回答（1 个）

0 个评论 显示 -2更早的评论隐藏 -2更早的评论

另请参阅

类别

标签

产品

版本

Community Treasure Hunt

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

2 个评论
显示无隐藏无

0 个评论
显示 -2更早的评论隐藏 -2更早的评论