How can I perform speaker verification for X-Vectors based on the ivectorsystem documentation?
5 次查看(过去 30 天)
显示 更早的评论
I am trying to create a basic voice based attendance system as a beginner project for biometric based security. I am using MathWorks' implementation of X-Vector systems for this project. Based on this link's implementation of X-Vector based speaker verification : https://www.mathworks.com/help/audio/ug/speaker-recognition-using-x-vectors.html, I have already trained the TDNN, X-Vector system and PLDA scoring. I have also obtained thresholds for the PLDA and Cosine Similarity scoring here based on the Detection Error Tradeoff figure using the X-axis values of the EER.
Since the above link states that I-Vector and X-Vector share the same classifier backend ("The x-vector system backend, or classifier, is the same as developed for i-vector systems. For details on the backend, see Speaker Verification Using i-vectors and ivectorSystem."), how would I adapt the ivectorsystem's verify() function in the speaker verification using I-Vectors example to use X-Vectors instead per this link : https://www.mathworks.com/help/audio/ref/ivectorsystem.html? Presumably, in the X-Vector speaker recognition link, all the helper functions were probably wrapper functions for X-Vector.
0 个评论
采纳的回答
Brian Hemmat
2024-5-6
I don't think you can reuse the verify method for your purpose, but here's generally the steps you need to be taking:
To perform speaker verification, you need a ground truth speaker embedding. It can be an i-vector, an x-vector, etc. If you've already trained the x-vector model using the recipe in the example, you'll want to perform preprocessing and prediction using the same pipeline. Speaker Diarization Using x-vectors uses the x-vector model and walks through the preprocessing steps. Here is just a sketch of what it would look like:
x = knownspeechsignal;
features = (extract(afe,x)-globalPrecomputedMean)./globalPrecomputedSTD;
embeddingTemplate = predict(model,dlarray(features,'TCB'),Outputs="fc_1");
When you have unkown speech, you perform the same steps.
x = unknownspeechsignal;
features = (extract(afe,x)-globalPrecomputedMean)./globalPrecomputedSTD;
embeddingUnknown = predict(model,dlarray(features,'TCB'),Outputs="fc_1");
To perform speaker verification, you score the two features using either PLDA or CSS. Here's an example of CSS:
css = dot(embeddingTemplate,embeddingUnknown)/norm(embeddingTemplate)*norm(embeddingUnknown);
speakerisverified = css < threshold
You'll need to maintain a list of template embeddings to look up when attempting to perform speaker verification.
Here's a sketch of it all together.
% Create templates for known speakers
x = knownspeechsignal_1;
features = (extract(afe,x)-globalPrecomputedMean)./globalPrecomputedSTD;
embeddingTemplate_1 = predict(model,dlarray(features,'TCB'),Outputs="fc_1");
x = knownspeechsignal_2;
features = (extract(afe,x)-globalPrecomputedMean)./globalPrecomputedSTD;
embeddingTemplate_2 = predict(model,dlarray(features,'TCB'),Outputs="fc_1");
% Create an enrollment list
enrolledSpeakers = dictionary(["speaker 1","speaker 2"],[embeddingTemplate_1,embeddingTemplate_1]);
% Extract embedding from unknown speaker
x = unknownspeechsignal;
features = (extract(afe,x)-globalPrecomputedMean)./globalPrecomputedSTD;
embeddingUnknown = predict(model,dlarray(features,'TCB'),Outputs="fc_1");
% Unknown speaker purports to be speaker 1, verify that:
claimedidentity = "speaker 1";
embeddingTemplate = enrolledSpeakers("speaker 1");
css = dot(embeddingTemplate,embeddingUnknown)/norm(embeddingTemplate)*norm(embeddingUnknown);
speakerisverified = css < threshold
The PLDA model is not currently offered standalone, you can use the internal version that ivectorSystem has at your own risk (it is not intended to be user-facing and may change at any time). To see an example of using it, step through either the x-vector training example or diarization example. Alternatively, this example walks through the nitty-gritty of the entire i-vector system including the G-PLDA scoring: Speaker Verification Using i-vectors.
Also, depending on the difficulty of your speaker verification task, you might consider using the speakerRecognition function to return a pretrained i-vector system.
Please ask for any clarifying questions--I'm hoping to add some examples where the whole detection error tradeoff, identification, verification, are componentized.
2 个评论
Brian Hemmat
2024-5-16
编辑:Brian Hemmat
2024-5-16
I don't follow the first question--I would say try it and if it doesn't work, provide some code that lead to the error.
Regarding the second qeustion about general way to obtain a DET plot and calculate the FAR, FRR, and EER, that's also done explicitly here: Speaker Verification Using i-vectors. There are different ways to calculate the DET in terms of what data you use. Often there's explicit pairs you want to score against each other (at least--that's how competitions on the subject usually work).
I've found that just exhaustively pairing all embeddings gives about the same results. Below is a sketch of that.
Assume we have a matrix of embedding vectors output from your model.
embeddingLength = 200;
numEmbeddings = 20*30;
embeddings = rand(embeddingLength,numEmbeddings);
Each embedding vector has a corresponding label. So the labels elements correspond to the columns of embeddings.
labels = categorical(repelem(1:30,20));
Calculate scores for all pairs of embeddings--we'll throw away the repetitions later.
allscores = css(embeddings,embeddings);
Create a matrix that says whether the labels below to the same or different speakers.
uniqueLabels = unique(labels);
class_matrix = labels'==labels;
Isolate the scores that correspond to matched pairs and the scores that correspond to unmatched pairs.
n = size(scoresmat,1);
lower_triangular_logical = tril(ones(n, n), -1) == 1;
scoresmat(~lower_triangular_logical) = nan;
scoreLike = scoresmat(class_matrix);
scoreUnlike = scoresmat(~class_matrix);
scoreLike(isnan(scoreLike)) = [];
scoreUnlike(isnan(scoreUnlike)) = [];
Define a range of thresholds to test
numThresholdsInSweep = 1000;
Thresholds = linspace(min(scoreUnlike),max(scoreLike),numThresholdsInSweep);
Calculate the false reject rate for each threshold in the sweep.
FRR = mean(scoreLike(:)<Thresholds(:)',1);
Calculate the false acceptance rate for each threshold in the sweep.
FAR = mean(scoreUnlike(:)>=Thresholds(:)',1);
Get the threshold where the FRR and FAR intersect (a better version of this would interpolate the points before and after).
[~,EERThresholdIdx] = min(abs(FRR-FAR));
EERThreshold = Thresholds(EERThresholdIdx);
Calculate the EER.
EER = mean([FAR(EERThresholdIdx),FRR(EERThresholdIdx)]);
Plot the results.
figure
plot(Thresholds,FRR,"k"), hold on
plot(Thresholds,FAR,"b")
plot(EERThreshold,EER,"ro",MarkerFaceColor="r")
title(["Equal Error Rate = " + round(EER,4),"Threshold = " + round(EERThreshold,4)])
xlabel('Threshold')
ylabel('Error Rate')
legend('FAR','FRR','Equal Error Rate (EER)')
grid on
axis tight
hold off
Supporting Functions
function y = css(w1,wt)
% This calculates the css of all pairs in w1 and wt in a vectorized way.
% Add this to your path to use.
y = squeeze(sum(w1.*reshape(wt,size(wt,1),1,[]),1)./(vecnorm(w1).*reshape(vecnorm(wt),1,1,[])));
end
更多回答(0 个)
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 Pretrained Models 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!