How to filter breath noise in audio？

Question

wei sun 2022-7-12

0
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/1757870-how-to-filter-breath-noise-in-audio

评论： Mathieu NOE 2022-7-15

audio.zip

In the attachment are the original audio files and the MATLAB filter files used. I tried low-pass filtering and band-pass filtering. The effect is not obvious. This noise is mainly heavy breathing sound. How can I filter this breathing sound and save the speaking sound completely (Chinese or English)?

5 个评论
显示 3更早的评论隐藏 3更早的评论

Jonas 2022-7-13

do you want to remove it only in this sound or do you want to do this automatically for multiple files?

wei sun 2022-7-13

remove or attenuate this noise.

请先登录，再进行评论。

请先登录，再回答此问题。

Answer 1

Mathieu NOE 2022-7-13

1
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/1757870-how-to-filter-breath-noise-in-audio#answer_1006045

编辑：Mathieu NOE 2022-7-13

在 MATLAB Online 中打开

hello

i opted for a strategy based on the spectrogram content. I noticed that the "breathing" sections are characterized by a strong spectrogram output below 100 Hz (red dots) which is not the case for the "speaking" sections

I worked on channel 1 as channel 2 is clipped (distorded)

so I simply reduced the volume (here - 30 dB) for the segments that goes from the local minima just before and after each red dot

(you can also put directly zero if you prefer - see options in the code)

   
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% options 
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% spectrogram dB scale
spectrogram_dB_scale = 80;  % dB range scale (means , the lowest displayed level is XX dB below the max level)
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% load signal
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
[signal,Fs] = audioread('original.wav');
dt = 1/Fs;
[samples,channels] = size(signal);
% select channel (if needed)
channels = 1;
signal = signal(:,channels);
signal_filtered = signal;
% time vector 
time = (0:samples-1)*dt;
%% decimate (if needed)
% NB : decim = 1 will do nothing (output = input)
decim = 40;
if decim>1
    signal_decim = decimate(signal,decim);
    Fs_decim = Fs/decim;
end
samples_decim = length(signal_decim);
time_decim = (0:samples_decim-1)*1/Fs_decim;
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% FFT parameters
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
NFFT = 512;    % 
OVERLAP = 0.75;
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% display : time / frequency analysis : spectrogram 
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
    [sg,fsg,tsg] = specgram(signal_decim,NFFT,Fs_decim,hanning(NFFT),floor(NFFT*OVERLAP));  
    % FFT normalisation and conversion amplitude from linear to dB (peak)
    sg_dBpeak = 20*log10(abs(sg))+20*log10(2/length(fsg));     % NB : X=fft(x.*hanning(N))*4/N; % hanning only
     % saturation of the dB range : 
    min_disp_dB = round(max(max(sg_dBpeak))) - spectrogram_dB_scale;
    sg_dBpeak(sg_dBpeak<min_disp_dB) = min_disp_dB;
    % plots spectrogram
    figure(2);
    imagesc(tsg,fsg,sg_dBpeak);colormap('jet');
    axis('xy');colorbar('vert');grid on
    df = fsg(2)-fsg(1); % freq resolution 
    title(['Spectrogram / Fs = ' num2str(Fs) ' Hz / Delta f = ' num2str(df,3) ' Hz ']);
    xlabel('Time (s)');ylabel('Frequency (Hz)');
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% extract SG (dB) values from 0 to 100 hz (loud level in this freq range is
% breath sound
ind = find(fsg<=100);
fsg_breath = fsg(ind);
sg_dB_breath = sg_dBpeak(ind,:);
    max_dB = max(sg_dB_breath,[],1);
    max_dB = max_dB-min(max_dB); % shift the dB values to positive values for good working islocalmax
    % select peaks above +25 dB and neighboring local mins
    % find local maxima
    [tf, P] = islocalmax(max_dB,'MinProminence',25);
    x_peak = tsg(tf);
    y_peak = max_dB(tf);
    % find local minima
    [tm, P] = islocalmin(max_dB);
    x_min = tsg(tm);
    y_min  = max_dB(tm);
    figure(3);plot(tsg,max_dB,x_peak,y_peak,'dr',x_min,y_min,'dk');
    title('Spectrogram max dB value vs Time');
    xlabel('Time (s)');ylabel('Max dB value');
    
    % set to zero the data that are defined by the local mins just before
    % and after the high peaks
    
    for ck = 1:numel(x_peak)
        % search x_min just before 
        dist = x_min - x_peak(ck);
        ind_bef = find(dist<0,1,'last');
        x_min_bef = x_min(ind_bef);
        ind_aft = find(dist>0,1,'first');
        x_min_aft = x_min(ind_aft);   
        
        % now zero time signal between these two time indexes 
        ind = find(time>=x_min_bef & time<=x_min_aft);
        % signal_filtered(ind) = 0;  % option 1 : zero 
        signal_filtered(ind) = signal_filtered(ind)/30 ;  % option 2 :  30 dB attenuation
    end
    
    
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% display : time domain plot
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
figure(1),
subplot(2,1,1),plot(time,signal,'b');grid on
title(['Time plot  / Fs = ' num2str(Fs) ' Hz / raw data ']);
xlabel('Time (s)');ylabel('Amplitude');
subplot(2,1,2),plot(time,signal_filtered,'b');grid on
title(['Time plot  / Fs = ' num2str(Fs) ' Hz / filtered data ']);
xlabel('Time (s)');ylabel('Amplitude');
    
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% export signal
audiowrite('filtered.wav',signal_filtered,Fs); % audiowrite(filename,y,Fs,varargin)

8 个评论
显示 6更早的评论隐藏 6更早的评论

Mathieu NOE 2022-7-13

在 MATLAB Online 中打开

ok so this is a new attempt for the phone (first) case

again I tested first channel only

here , the logic is inversed , as the speaker voice segment contain smore energy below 400 Hz compared to breathing sound

also as the speaker start right away, I padded some random noise first to let the code detect the first voice segment

hope it helps !

clc
clearvars
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% FFT parameters
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
NFFT = 512;    % 
OVERLAP = 0.75;
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% options 
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% spectrogram dB scale
spectrogram_dB_scale = 80;  % dB range scale (means , the lowest displayed level is XX dB below the max level)
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% load signal
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
[signal,Fs] = audioread('phone.wav');
dt = 1/Fs;
[samples,channels] = size(signal);
% select channel (if needed)
channels = 1;
signal = signal(:,channels);
signal = signal(:); % make sure it's a col vector
% pad some dummy noise at the beginning to make spectrogram nicer (and get
% the first peak of fft data)
signal = [0.01*randn(100*NFFT,1);signal];
samples = numel(signal);
signal_filtered = zeros(size(signal));
% time vector 
time = (0:samples-1)*dt;
%% decimate (if needed)
% NB : decim = 1 will do nothing (output = input)
decim = 40;
if decim>1
    signal_decim = decimate(signal,decim);
    Fs_decim = Fs/decim;
elseif decim ==1 
    signal_decim = signal;
    Fs_decim = Fs;
end
samples_decim = length(signal_decim);
time_decim = (0:samples_decim-1)*1/Fs_decim;
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% display 3 : time / frequency analysis : spectrogram demo
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
    [sg,fsg,tsg] = specgram(signal_decim,NFFT,Fs_decim,hanning(NFFT),floor(NFFT*OVERLAP));  
    % NB specgram time is offset so we must compensate for that 
    tsg = tsg + NFFT/(2*Fs_decim);
    
    % FFT normalisation and conversion amplitude from linear to dB (peak)
    sg_dBpeak = 20*log10(abs(sg))+20*log10(2/length(fsg));     % NB : X=fft(x.*hanning(N))*4/N; % hanning only
     % saturation of the dB range : 
    min_disp_dB = round(max(max(sg_dBpeak))) - spectrogram_dB_scale;
    sg_dBpeak(sg_dBpeak<min_disp_dB) = min_disp_dB;
    % plots spectrogram
    figure(2);
    subplot(2,1,1),plot(time,signal,'b');grid on
    title(['Time plot  / Fs = ' num2str(Fs) ' Hz / raw data ']);
    xlabel('Time (s)');ylabel('Amplitude');
    xlim([min(time) max(time)]);
   subplot(2,1,2),imagesc(tsg,fsg,sg_dBpeak);colormap('jet');grid on
    xlim([min(time) max(time)]);
    axis('xy');
    df = fsg(2)-fsg(1); % freq resolution 
    title(['Spectrogram / Fs = ' num2str(Fs) ' Hz / Delta f = ' num2str(df,3) ' Hz ']);
    xlabel('Time (s)');ylabel('Frequency (Hz)');
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% extract SG (dB) values from 0 to 400 hz (loud level in this freq range is
% speaker sound
ind = find(fsg<=400);
fsg_speak = fsg(ind);
sg_dB_speak = sg_dBpeak(ind,:);
    max_dB = max(sg_dB_speak,[],1);
    max_dB = max_dB-min(max_dB); % shift the dB values to positive values for good working islocalmax
    % select peaks above +25 dB
    % find local maxima
    [tf, P] = islocalmax(max_dB);
    x_peak = tsg(tf);
    y_peak = max_dB(tf);
    ii = (y_peak>25);
    x_peak = x_peak(ii);
    y_peak = y_peak(ii);    
    % find local minima
    [tm, P] = islocalmin(max_dB);
    x_min = tsg(tm);
    y_min  = max_dB(tm);
    figure(3);plot(tsg,max_dB,x_peak,y_peak,'dr',x_min,y_min,'dk');
    title('Spectrogram max dB value vs Time');
    xlabel('Time (s)');ylabel('Max dB value');
    
    % KEEP the data that are defined by the local mins just before
    % and after the high peaks
    for ck = 1:numel(x_peak)
        % search x_min just before 
        dist = x_min - x_peak(ck);
        ind_bef = find(dist<0,1,'last');
        x_min_bef = x_min(ind_bef);
        ind_aft = find(dist>0,1,'first');
        x_min_aft = x_min(ind_aft);   
        
        % now zero time signal between these two time indexes 
        ind = find(time>=x_min_bef & time<=x_min_aft);
        signal_filtered(ind) = signal(ind) ;  % keeep that portion of signal 
    end
    
    
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% display 1 : time domain plot
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
figure(1),
subplot(2,1,1),plot(time,signal,'b');grid on
title(['Time plot  / Fs = ' num2str(Fs) ' Hz / raw data ']);
xlabel('Time (s)');ylabel('Amplitude');
subplot(2,1,2),plot(time,signal_filtered,'b');grid on
title(['Time plot  / Fs = ' num2str(Fs) ' Hz / filtered data ']);
xlabel('Time (s)');ylabel('Amplitude');
    
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% export signal
audiowrite('phone_filtered.wav',signal_filtered,Fs); % audiowrite(filename,y,Fs,varargin)

Mathieu NOE 2022-7-15

在 MATLAB Online 中打开

hello Wei

1/ as I said : in the "phone" wav file , there is no silent or breath sound before the speaker start to speak, so the spectrogram would have right away some energy in the speaker frequency range (used for detection later in the code). When I compute the max of the spectrogram in the low frequency range (=> max_dB) we would not have a first minima followed by a increase , then a peak => so islocalmax would not detect that first peak. If I don't padd this random noise we start with a high value then a decrease so we loose that first voice segment.

2/ the decimated data is used only for the spectrogram computation and detection of the time blocks to be kept (speaker). The algorithm that says if we have speakr sound or something else is based on low frequency threshold (400 Hz) so it's computationnaly more fficient to decimate the data and make shorter fft spectrograms rather than keeping the original sampling rate , making long fft computations and using only the very low end data only.

The data which is filtered is the original sampling freq (that's why we have two data sets, the original sampling rate data (signal) and the decimated one (signal_decim)

The output wav file has therefore same sampling rate as original file so no distorsion. You can hear the voice segments are not latered by the code.

wei sun 2022-7-15

Ok thank you, I have been taught, the FFT of the entire segment does take up a lot of computing power, and it will introduce a lot of invalid information。

Mathieu NOE 2022-7-15

the saving in computation is proportionnal to the applied decimation factor (here 40) so I don't think it's negelctable especcially if you want to apply the code to longer wav files

but of course you can remove the decimation operation if you feel bad about it

请先登录，再进行评论。