- Improve the recording set up to increase signal amplitude and reduce background noise.
- Edit the audio file to extract the exact segments that contain the singing which you want to analyze.
Extracting Audio File Frequency
57 次查看(过去 30 天)
显示 更早的评论
Hello there,
I need to find the frequency of the audio file for specific segments. In my code I find the segments of talking and take the fft of these portions and find the frequencies. But the problem arises at the frequency part I need to find different frequencies but find exactly the same values. Could you please help?
Thanks in advance.
0 个评论
采纳的回答
William Rose
2022-4-15
I have listened to file A1.wav. The instances of singing are not at 15 second intervals, even though this is expected by the code. Therefore the segments analyzed do not always contain singing. The amplitude of the singing is small. There are significant unrelated background noises. The pitch being sung sounds like the E flat above middle C (Eflat4). Therefore the dected dominant frequency should be around 311.1 Hz.
Approximate times of vocalization, in seconds: 1-5, 22-27, 42-46, 61-66, 82-87, 102-107.
There is background talking during 61-66. There is coughing or some other background sound in 82-87.
Conclusion: The frequency analysis of file A1.wav by rmscalculation.m is affected by background noises and incorrect timing. The signal to noise level is poor.
Recommendations:
I have looked at your code: rmscalculation.m.
Analysis of the script:
rmscalculation.m has three nested loops.
The outer loop is: for k=1:number of participants.
The middle loop is: for l=1:number of tests. This loop reads in a different audio file on each pass. It computes envolpe of hte signal as the moving average (with width 1000 points=1/44 of a second) of the absolute vaue of the signal. When the moving average crosses a threshold is deemed to be the time when talking starts.
The inner loop is: for i=1:6. Each pass extracts a segment of the signal. The segment start times are 15 seconds apart. The segments are 4.9 seconds long. The power spectrum of the segment is determined. The frequency that has max. power, within the frequency range 236 to 367 Hz, is determined for each segment.
Does that sond correct?
The script rmscalculation.m does not run. It gives the error
Error using xlsread (line 136)
Unable to open file 'F4_A1'.
File 'F4_A1' not found.
Error in rmscalculation (line 11)
a = xlsread(fname1); % comand to read excel/ particle count file
I commented out the lines related to file F4_A1. Then the script ran without error. It does not display any results.
To see the results:
>> disp(seg_Freq')
261.9312
239.8933
261.9312
261.9312
255.0339
262.0995
The frequency range of 90% to 140% of the middle C frequency will allow detection of frequencies corresponding to pitches from just below B3 to just above F4.
0 个评论
更多回答(3 个)
William Rose
2022-4-12
[moved my answer from a comment to an answer]
The google drive link you provided requres access permission. You may attach the audio file if you zip it first.
You probably know this already, but I will mention this just in case you do not know this:
When you compute the FFT or power spectrum of a segment of the signal, the frequencies of the FFT or power spectrum will be the same for each different segment (assuming the segment lengths are the same). The amplitude or power at each frequency will vary from segment to segment. You can compute the mean frequency for a segment, or you can compute the frequency with maximum power in each segment, etc. The script below does both, for an 8-second signal with gradually increasing frequency, divided into 0.5 second long segments. It plots the results. It appears that the max power frequency is better behaved than the mean frequency, in this example.
%% constants
Fs=8000; %sampling rate (Hz)
T=8; %signal duration (s)
wi=220*2*pi; %initial frequency (radians/s)
wf=880*2*pi; %final frequency (radians/s)
Tseg=0.5; %segment duration (s)
%% compute the signal
dt=1/Fs; %sampling interval
N=Fs*T; %signal duration (samples)
t=dt*(0:N-1); %vector of time values
phase=wi*t+(wf-wi)*t.*t/(2*T); %phase for signal with changing frequency
x=cos(phase); %signal amplitude
%% compute FFT of each segment
N1=Fs*Tseg; %segment duration (samples)
Nseg=T/Tseg; %number of segments
fmax=zeros(1,Nseg); %allocate array for max.power frequency of each segment
fmean=zeros(1,Nseg); %allocate array for mean frequency of each segment
df=1/Tseg; %frequency interval
f=(0:N1/2)*df; %vector of frequencies, up to Nyquist frequency
Nf=length(f); %number of frequencies in one-sided FFT
Y=zeros(Nf,Nseg); %allocate array for FFTs
for i=1:Nseg
X=fft(x((i-1)*N1+1:i*N1));
Y(:,i)=abs(X(1:Nf)); %magnitude of one-sided FFT
[~,indmax]=max(Y(:,i)); %index of largest element of Y
fmax(i)=f(indmax); %frequency with maximum power
fmean(i)=sum(f'.*Y(:,i))/sum(Y(:,i)); %mean frequency (amplitude-weighted)
end
%% plot results
figure;
subplot(211), plot(1:Nseg,fmax,'rx',1:Nseg,fmean,'bo');
xlabel('Segment'); ylabel('Frequency (Hz)');
legend('Max.Freq.','Mean Freq.'); grid on
title('Max & Mean Frequency vs. Segment')
subplot(212)
colorspec=[1,0,0;1,.33,0;1,.67,0;
1,1,0;.67,1,0;.33,1,0;
0,1,0;0,1,.33;0,1,.67;
0,1,1;0,.67,1;0,.33,0;
0,0,1;.5,0,1;
1,0,1;1,0,.5];
for i=1:Nseg
plot(f,Y(:,i),'Color',colorspec(i,:));
hold on;
end
xlabel('Frequency (Hz)'); ylabel('Amplitude'); xlim([0,1200])
grid on; title('Amplitude Spectra for Segments')
Try it. Good luck.
William Rose
2022-4-13
Middle C! The frequency sweep in my code goes from A3 to A5.
2 个评论
William Rose
2022-4-13
I was able to see the file on google drive, which I could not do before. However, when I click "download" to put it on my drive - which I need to do in order to open it in Matlab - nothing happens. The Help for google drive says
"If you can't download a file: If you can't download a file, the owner may have disabled options to print, download, or copy for people with only comment or view access."
I suspect that's what haooening here. I can't help more since the file is impossible to access. Post a shorter file that fits within the zip limit.
William Rose
2022-4-16
I wrote a script that extract 3 seconds of sound from each vocalization. As I said before,the times of note-singing are approximately: 1-5, 22-27, 42-46, 61-66, 82-87, 102-107 seconds.
Therefore I extract sound from 2-5, 23-26, 43-46, 62-62, 83-86, 103-106 seconds.
I measure the mean frequency and the frequency of maxmimum power in each segment.
The max.power frequencies are about 620-630 Hz, consistent with the subjects singing E flat 5, also known as the E flat above treble C. The expected frequency of this pitch is 622 Hz, with A440 equal temperament tuning.
The script plots the max frequency for each segment and the power spectrum for each segment.
You confined the frequency search to 0.9 - 1.4 times middle C. This singing signal has very little power in that frequency range. Most of the power is around 630 Hz. I initially thought thse children were singing in octave 4 (using scientific pitch notation). Now I think they are singing an octave higher, in octave 5. It is not always easy to decide.
My code also creates a file, A1sel.wav, which is the selected audio segments, plus 1 second of silence after each segment. The graphical output from the script is below.
2 个评论
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 Audio Processing Algorithm Design 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!