sisnr

Scale-invariant signal-to-noise ratio

Since R2024b

collapse all in page

Syntax

metric = sisnr(proc,ref)

metric = sisnr(proc,ref,SubtractMean=tf)

Description

metric = sisnr(proc,ref) returns the scale-invariant signal-to-noise ratio (SI-SNR).

example

metric = sisnr(proc,ref,SubtractMean=tf) specifies whether to subtract the individual signal means before computing the SI-SNR.

example

Examples

collapse all

Measure SI-SNR

Open Live Script

Read in an audio signal containing speech.

ref = audioread("SpeechDFT-16-8-mono-5secs.wav");

Generate a signal containing pink noise and add it to the speech to create a noisy signal.

proc = ref + pinknoise((size(ref)));

Use sisnr to calculate the SI-SNR with the noisy and clean signals. See how modifying the scale of the signals does not change the resulting metric.

alpha = 20;
sisnr(alpha*proc,ref)

ans = 
11.8860

Compare SI-SNR and SI-SDR

Open Live Script

Read in an audio signal containing speech. Generate a signal containing pink noise and add it to the speech to create a noisy signal.

ref = audioread("SpeechDFT-16-8-mono-5secs.wav");
proc = ref + pinknoise(size(ref));

Alter the noisy signal by adding a series of offsets and measure the SI-SNR of these altered signals with the reference signal. The sisnr function automatically subtracts the means of these individual signals before computing the SI-SNR. Call sisnr again with these signals and with SubtractMean set to false. This resulting metric is commonly referred to as the SI-SDR.

offset = 0:0.1:1;
snr = sisnr(proc+offset,ref);
sdr = sisnr(proc+offset,ref,SubtractMean=false);

Plot the SI-SNR and SI-SDR as they change given the different offsets of the signal. See how the SI-SNR does not change with the offsets because of the zero centering.

plot(offset,snr,"r-",offset,sdr,"b-", ...
    offset,snr,"ro",offset,sdr,"bo")
legend("SI-SNR","SI-SDR")
xlabel("Offset")
ylabel("Metric")

Figure contains an axes object. The axes object with xlabel Offset, ylabel Metric contains 4 objects of type line. One or more of the lines displays its values using only markers These objects represent SI-SNR, SI-SDR.

Evaluate Speaker Separation with SI-SNR

Open Live Script

Create an audio signal that combines the speech of two speakers. Scale one of the speech signals by one half before summing them.

[s,fs] = audioread("MultipleSpeakers-16-8-4channel-5secs.flac");
s = s(:,1:2).*[1,0.5];
x = sum(s,2);
x = x./max(abs(x));

Use separateSpeakers to perform speaker separation on the mixed signal. Call the function again with no output arguments to plot the separated signals.

y = separateSpeakers(x,fs,NumSpeakers=2);

separateSpeakers(x,fs,NumSpeakers=2)

Figure contains 3 axes objects. Axes object 1 with ylabel Mix contains 2 objects of type line. One or more of the lines displays its values using only markers These objects represent Input, Reconstruction. Axes object 2 with ylabel Speaker 1 contains an object of type line. Axes object 3 with xlabel Time (s), ylabel Speaker 2 contains an object of type line.

Measure the SI-SNR to evaluate the speaker separation. Call sisnr comparing the separated signals with both possible permutations of the ground truth signals.

snr1 = mean(sisnr(y,s))

snr1 = single

-39.8843

snr2 = mean(sisnr(y,fliplr(s)))

snr2 = single

21.1212

Use permutationInvariantSISNR to measure the SI-SNR of the best permutation aligning the separated signals with the ground truth.

pi_snr = permutationInvariantSISNR(y,s)

pi_snr = single

21.1212

Input Arguments

collapse all

`proc` — Processed signal
column vector | matrix

Processed signal, specified as a column vector or a matrix where the rows correspond to the time dimension and the columns are individual signals.

The proc and ref inputs must have the same number of rows. For matrix inputs, the inputs must either have the same number of columns or one of the inputs must be a column vector.

Data Types: single | double

`ref` — Reference signal
column vector | matrix

Reference signal, specified as a column vector or a matrix where the rows correspond to the time dimension and the columns are individual signals.

The proc and ref inputs must have the same number of rows. For matrix inputs, the inputs must either have the same number of columns or one of the inputs must be a column vector.

Data Types: single | double

`tf` — Subtract mean from each signal
`true` (default) | `false`

Center all individual signals by subtracting the signal means before computing the SI-SNR.

Data Types: logical

Output Arguments

collapse all

`metric` — SI-SNR metric
scalar | row vector

SI-SNR metric in dB comparing the proc and ref signals, returned as a scalar if the inputs are column vectors.

If the inputs are matrices, the function computes the SI-SNR for each pair of columns in proc and ref and returns a row vector of length N, where N is the number of columns. If one of the inputs is a column vector and the other is a matrix, sisnr implicitly expands the column vector.

If you set SubtractMean to false, the resulting metric is commonly referred to as the scale-invariant signal-to-distortion ration (SI-SDR).

Algorithms

The scale-invariant signal-to-noise ratio (SI-SNR) measures the level of distortion or noise in a processed signal by comparing it to a reference signal in a way that is invariant to the scaling of the signals. This metric is useful for evaluating speech enhancement and source separation systems.

The sisnr function calculates the SI-SNR according to the following formula, where s is the reference signal and ŝ is the processed signal.

$SI-SNR = 10 \log_{10} (\frac{{‖ α s ‖}^{2}}{{‖ α s - \hat{s} ‖}^{2}}), where α = \frac{{\hat{s}}^{T} s}{{‖ s ‖}^{2}}$

By default, the sisnr function subtracts the mean to zero-center the signal before calculating the SI-SNR. You can skip this step by setting SubtractMean to false, and the resulting metric is commonly referred to as the scale-invariant signal-to-distortion ratio (SI-SDR).

References

[1] Roux, Jonathan Le, Scott Wisdom, Hakan Erdogan, and John R. Hershey. “SDR – Half-Baked or Well Done?” In ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 626–30. Brighton, United Kingdom: IEEE, 2019. https://doi.org/10.1109/ICASSP.2019.8683855.

Extended Capabilities

GPU Arrays
Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.

Version History

Introduced in R2024b

sisnr

Syntax

Description

Examples

Measure SI-SNR

Compare SI-SNR and SI-SDR

Evaluate Speaker Separation with SI-SNR

Input Arguments

proc — Processed signal column vector | matrix

ref — Reference signal column vector | matrix

tf — Subtract mean from each signal true (default) | false

Output Arguments

metric — SI-SNR metric scalar | row vector

Algorithms

References

Extended Capabilities

GPU Arrays Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.

Version History

See Also

`proc` — Processed signal
column vector | matrix

`ref` — Reference signal
column vector | matrix

`tf` — Subtract mean from each signal
`true` (default) | `false`

`metric` — SI-SNR metric
scalar | row vector

GPU Arrays
Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.