How pwelch computes confidence intervals

Question

Stephen 2020-4-30

1
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/522047-how-pwelch-computes-confidence-intervals

评论： William Rose 2023-11-26

Can Mathworks please point to exactly which formulae are being used to compute confidence intervals in pwelch? Perhaps state the formulae being used in the code? This would include how exacty the number of degrees of freedom are being computed.

3 个评论
显示 1更早的评论隐藏 1更早的评论

William Rose 2021-2-15

I read one of the two books which is cited in document pwelch, which was recommended by Kiran. The book I read is Statistical digital signal processing and modeling by M.H. Hayes, 1996. The relevant portion is seciton 8.2.5, pp.415-419 (which are pages 435-439 in the electronic version). It does not give formulas for the confidence interval for the Welch periodogram. Therefore it does not answer the original question: how is the confidence interval determined.

Stephen 2021-2-15

在 MATLAB Online 中打开

There are several good books on this - the original Jenkins and Watt book is excellent (although out of print and not evailble in pdf form) as is a book aimed at oceanographers by Thomson and Emery (Data Analysis Methods in Physical Oceanography). The reason I originally asked this question was because I suspect that the confidence intervals given by pwelch are not quite calculated correctly. One can check this by doing spectra for a random process that has a known spectrum - Jenkins and Watt give an excellent example of such a process in their book. I wrote a script - below - that generates a time series using the auto-regressive process J&W use for their examples. With apologies for the poor coding technique (I first learned programming in Fortran 66), you will see that if one inverts the CIs that Matlab computes and swap the results between the upper and lower CIs, one gets something that looks pretty good whereas if one sticks with the CIs pwelch gives, the lower and upper bounds are both high, meaning that the lower bound is not low enough and the upper bound is too high. So, this is what makes me think there is something wrong in pwelch.

%
% Examine spectral confidence intervals using a random process with a known spectrum
% The time series used is generated via an AR1 process detailed in Jenkins and Watts on p 269
% S. Monismith
% Stanford University
% 2/15/21
%
clear
M=16;
seriesize=2^M;          % length of time series
t=0:seriesize-1;        % times
X=ones(seriesize,1);    % 
Z=randn(seriesize,1);
factor=0.5;
for j=3:seriesize
    X(j)=X(j-1)-factor*X(j-2)+Z(j);
end
X(1)=X(seriesize)-factor*X(seriesize-1)+Z(1);
X(2)=X(1)-factor*X(seriesize)+Z(2);
idothese=seriesize;
%
% Key adjustment - how many segments
%
nsegs=2;
%
% Now set up and do pwelch estimation
%
windowlength=idothese/nsegs; 
nfft=windowlength;
[PXX,f,PXXc] = pwelch(X,windowlength,0.5*windowlength,nfft,1,'ConfidenceLevel',0.95);
 
%
% Now  compute bounds - note that they are multiplicative factors on known spectrum
% First, we will use what pwelch gives
%
    factor1=median(PXXc(:,1)./PXX);
    factor2=median(PXXc(:,2)./PXX);
 
subplot(1,2,1)
semilogy(f,PXX,'ko','markeredgecolor',0.7*[1,1,1],...
    'markerfacecolor',0.7*[1,1,1],'markersize',4)
hold on
Pxxtheory=var(X)*(0.834)./(2.25-3*cos(2*pi*f)+cos(4*pi*f));
hold on
semilogy(f,Pxxtheory,'r','linewidth',2)
semilogy(f,Pxxtheory*factor1,'b--','linewidth',2)
semilogy(f,Pxxtheory*factor2,'r--','linewidth',2)
hold off
ylim([1e-1 1e2])
grid
set(gca,'fontsize',20)
ylabel('\Phi_{XX}','fontsize',20)
xlabel('f','fontsize',20)
title('CIs as given by pwelch','fontsize',20)
%
% Now re-do the confidence intervals with a suggested correction
%
    factor1=median(PXX./PXXc(:,1));   
    factor2=median(PXX./PXXc(:,2));
%
subplot(1,2,2)
semilogy(f,PXX,'ko','markeredgecolor',0.7*[1,1,1],...
    'markerfacecolor',0.7*[1,1,1],'markersize',4)
hold on
Pxxtheory=var(X)*(0.834)./(2.25-3*cos(2*pi*f)+cos(4*pi*f));
hold on
semilogy(f,Pxxtheory,'r','linewidth',2)
semilogy(f,Pxxtheory*factor1,'b--','linewidth',2)
semilogy(f,Pxxtheory*factor2,'r--','linewidth',2)
hold off
ylim([1e-1 1e2])
grid
set(gca,'fontsize',20)
ylabel('\Phi_{XX}','fontsize',20)
xlabel('f','fontsize',20)
title('CIs inverted and swapped','fontsize',20)

请先登录，再进行评论。

请先登录，再回答此问题。

Answer 1

Kiran Felix Robert 2021-2-5

0
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/522047-how-pwelch-computes-confidence-intervals#answer_615402

Hi Stephen,

Refer the pwelchfunction documentation for the details.

2 个评论
显示无隐藏无

William Rose 2021-2-14

Kiran, thak you for this answer but plese see my request above for more info. Thanks.

William Rose 2021-2-15

Kiran,

I read the source code for pwelch.m (C:\Program Files\MATLAB\R2018b\toolbox\signal\signal\pwelch.m). It does not reveal how the confidence interval is computed. pwelch.m calls welch.m (C:\Program Files\MATLAB\R2018b\toolbox\signal\signal\+spectrum\welch.m), which I inspected. It calls another version of pwelch() which I oculd not find, probably because it is in a compiled library. Therefore I would appreciate an actual answer to Stephen's question. Thank you.

请先登录，再进行评论。

Answer 2

William Rose 2021-2-26

0
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/522047-how-pwelch-computes-confidence-intervals#answer_634189

The 2-sided confidence interval (C.I.) with probability p on the power spectral denisty (p.s.d.), which is returned by pwelch(), is given by

Pxxhat(f)*k/chi2((1+p)/2,k) < Pxx(f) < Pxxhat(f)*k/chi2((1-p)/2,k)

where Pxxhat(f) is the experimental estimate of the p.s.d. at frequency f, and Pxx(f) is the true, but unknown, value of the p.s.d., and k is the degrees of freedom. This is analagous to a confidence interval for a variance. The degrees of freedom is given by

k = 2*K

where upper case K is the number of segments or windows used when esitmating the p.s.d.

If the window length and offset are chosen so that the windows fit the signal without any leftover bits, then

K=(N-L)/D+1

where N=sinal length, L=window length, D=offset. (Don't confuse overlap and offset. Overlap=L-D. pwelch() takes overlap as an argument, but Welch (1967) uses offset=D in his formulas.

Example: Signal length=1024, window=Hamming, with length 256, half-overlapped. Then N=1024, L=256, and D=offset=128. Therefore K=7, by the formula above, and we can verify that 7 half overlapped windows cover the signal exactly. Then k=2*K = 14.

Prob{Pxxhat*14/chi2(.975,14) < Pxx < Pxxhat*14/chi2(.025,14)} = 0.95

=> Prob{Pxxhat*0.536 < Pxx < Pxxhat*2.487} = 0.95

I have checked the above formulas for various combinations of N, L, D, and probability p. The formuals correctly reproduce the confidence intervals reported by pwelch().

The formulas above are not correct from a statistical point of view, when the overlap is more than zero, but they are the formulas that pwelch() uses. The error is that k, the degrees of freedom, should be somewhat less than 2*K when the windows overlap. The exact formula is given by Welch, IEEE Trans Audio Electroacoustics AU-15:70-73, 1967, https://ieeexplore.ieee.org/document/1161901. It is complicated so I will not reproduce it here. It has a varible number of terms, depending on window shape and amount of overlap. The C.I. from pwelch() does not take window shape or overlap into account. pwelch()'s confidence intervals are correct when the windows do not overlap. The error in the confidence interval reported by pwelch() is relatively small when using Hamming or Hann or similar windows at half-overlap. In general, the C.I. reported by pwelch() is narrower than it should be. See table below:

The errors in the confidence intervals reported by pwelch() become more severe if the overlap is greater. By more severe I mean that the C.I. from pwelch() is significantly more narrow that than the true C.I.

5 个评论
显示 3更早的评论隐藏 3更早的评论

William Rose 2023-11-25

@Felicien Bonnefoy,

No problem.

The estimate of the power spectrum at a specific frequency, when scaled appropriately, has an approximately chi-squared distribution.* Therefore the formula that says the 95% C.I. is approximately twice the standard deviation does not apply. That formula is for estimates that have a normal distribution - such as the mean of a set of measurements, which is approximately normal, due to the central limit theorem.

* I say "approximately" because, if overlapped segments are used to compute the spectrum, as is usually done, then the different components of the estimate are not all independent - due to the partial overlap. A true chi-squared random variable is the sum of the squares of independent standard normal random variables.

William Rose 2023-11-26

@Felicien Bonnefoy,

You said you are trying to understand the CIs given by pwelch(). I too am interested in this topic.

You may wonder if there is a simple formula, comparable to

which relates μ and σ for a group of power spectrum estimates to the upper 95% C.I. of the power spectrum. I do not think such a formula exists. The C.I. returned by pwelch() uses the formula I gave in an earlier comment. I have done millions of Monte Carlo simulations of power spectrum estimates. I conclude that the formula which pwelch() uses for confidence intervals is accurate when there is no overlap of segments in the spectrum estimate. The CI returned by pwelch() is somewhat narrower than it should be, when overlapping segments are used to estimate the spectrum. In other words, the actual 95% C.I. is slightly wider than the C.I. returned by pwelch().

If you wish to discuss this further, please email me separately by clicking on the envelope at the upper right corner of the box that pops up when you click on my name.

请先登录，再进行评论。

How pwelch computes confidence intervals

3 个评论
显示 1更早的评论隐藏 1更早的评论

回答（2 个）

2 个评论
显示无隐藏无

5 个评论
显示 3更早的评论隐藏 3更早的评论

另请参阅

类别

标签

产品

版本

Community Treasure Hunt

How pwelch computes confidence intervals

3 个评论 显示 1更早的评论隐藏 1更早的评论

回答（2 个）

2 个评论 显示 无隐藏 无

5 个评论 显示 3更早的评论隐藏 3更早的评论

另请参阅

类别

标签

产品

版本

Community Treasure Hunt

3 个评论
显示 1更早的评论隐藏 1更早的评论

2 个评论
显示无隐藏无

5 个评论
显示 3更早的评论隐藏 3更早的评论