Main Content

sisnr

Scale-invariant signal-to-noise ratio

Since R2024b

    Description

    metric = sisnr(proc,ref) returns the scale-invariant signal-to-noise ratio (SI-SNR).

    example

    metric = sisnr(proc,ref,SubtractMean=tf) specifies whether to subtract the individual signal means before computing the SI-SNR.

    example

    Examples

    collapse all

    Read in an audio signal containing speech.

    ref = audioread("SpeechDFT-16-8-mono-5secs.wav");

    Generate a signal containing pink noise and add it to the speech to create a noisy signal.

    proc = ref + pinknoise((size(ref)));

    Use sisnr to calculate the SI-SNR with the noisy and clean signals. See how modifying the scale of the signals does not change the resulting metric.

    alpha = 20;
    sisnr(alpha*proc,ref)
    ans = 
    11.8860
    

    Read in an audio signal containing speech. Generate a signal containing pink noise and add it to the speech to create a noisy signal.

    ref = audioread("SpeechDFT-16-8-mono-5secs.wav");
    proc = ref + pinknoise(size(ref));

    Alter the noisy signal by adding a series of offsets and measure the SI-SNR of these altered signals with the reference signal. The sisnr function automatically subtracts the means of these individual signals before computing the SI-SNR. Call sisnr again with these signals and with SubtractMean set to false. This resulting metric is commonly referred to as the SI-SDR.

    offset = 0:0.1:1;
    snr = sisnr(proc+offset,ref);
    sdr = sisnr(proc+offset,ref,SubtractMean=false);

    Plot the SI-SNR and SI-SDR as they change given the different offsets of the signal. See how the SI-SNR does not change with the offsets because of the zero centering.

    plot(offset,snr,"r-",offset,sdr,"b-", ...
        offset,snr,"ro",offset,sdr,"bo")
    legend("SI-SNR","SI-SDR")
    xlabel("Offset")
    ylabel("Metric")

    Figure contains an axes object. The axes object with xlabel Offset, ylabel Metric contains 4 objects of type line. One or more of the lines displays its values using only markers These objects represent SI-SNR, SI-SDR.

    Create an audio signal that combines the speech of two speakers. Scale one of the speech signals by one half before summing them.

    [s,fs] = audioread("MultipleSpeakers-16-8-4channel-5secs.flac");
    s = s(:,1:2).*[1,0.5];
    x = sum(s,2);
    x = x./max(abs(x));

    Use separateSpeakers to perform speaker separation on the mixed signal. Call the function again with no output arguments to plot the separated signals.

    y = separateSpeakers(x,fs,NumSpeakers=2);
    
    separateSpeakers(x,fs,NumSpeakers=2)

    Figure contains 3 axes objects. Axes object 1 with ylabel Mix contains 2 objects of type line. One or more of the lines displays its values using only markers These objects represent Input, Reconstruction. Axes object 2 with ylabel Speaker 1 contains an object of type line. Axes object 3 with xlabel Time (s), ylabel Speaker 2 contains an object of type line.

    Measure the SI-SNR to evaluate the speaker separation. Call sisnr comparing the separated signals with both possible permutations of the ground truth signals.

    snr1 = mean(sisnr(y,s))
    snr1 = single
    
    -39.8843
    
    snr2 = mean(sisnr(y,fliplr(s)))
    snr2 = single
    
    21.1212
    

    Use permutationInvariantSISNR to measure the SI-SNR of the best permutation aligning the separated signals with the ground truth.

    pi_snr = permutationInvariantSISNR(y,s)
    pi_snr = single
    
    21.1212
    

    Input Arguments

    collapse all

    Processed signal, specified as a column vector or a matrix where the rows correspond to the time dimension and the columns are individual signals.

    The proc and ref inputs must have the same number of rows. For matrix inputs, the inputs must either have the same number of columns or one of the inputs must be a column vector.

    Data Types: single | double

    Reference signal, specified as a column vector or a matrix where the rows correspond to the time dimension and the columns are individual signals.

    The proc and ref inputs must have the same number of rows. For matrix inputs, the inputs must either have the same number of columns or one of the inputs must be a column vector.

    Data Types: single | double

    Center all individual signals by subtracting the signal means before computing the SI-SNR.

    Data Types: logical

    Output Arguments

    collapse all

    SI-SNR metric in dB comparing the proc and ref signals, returned as a scalar if the inputs are column vectors.

    If the inputs are matrices, the function computes the SI-SNR for each pair of columns in proc and ref and returns a row vector of length N, where N is the number of columns. If one of the inputs is a column vector and the other is a matrix, sisnr implicitly expands the column vector.

    If you set SubtractMean to false, the resulting metric is commonly referred to as the scale-invariant signal-to-distortion ration (SI-SDR).

    Algorithms

    The scale-invariant signal-to-noise ratio (SI-SNR) measures the level of distortion or noise in a processed signal by comparing it to a reference signal in a way that is invariant to the scaling of the signals. This metric is useful for evaluating speech enhancement and source separation systems.

    The sisnr function calculates the SI-SNR according to the following formula, where s is the reference signal and ŝ is the processed signal.

    SI-SNR=10log10(αs2αss^2), where α=s^Tss2

    By default, the sisnr function subtracts the mean to zero-center the signal before calculating the SI-SNR. You can skip this step by setting SubtractMean to false, and the resulting metric is commonly referred to as the scale-invariant signal-to-distortion ratio (SI-SDR).

    References

    [1] Roux, Jonathan Le, Scott Wisdom, Hakan Erdogan, and John R. Hershey. “SDR – Half-Baked or Well Done?” In ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 626–30. Brighton, United Kingdom: IEEE, 2019. https://doi.org/10.1109/ICASSP.2019.8683855.

    Extended Capabilities

    GPU Arrays
    Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.

    Version History

    Introduced in R2024b