How to alter given code to prevent/remove 'NaN' output from corrcoef function
显示 更早的评论
I am currently writing a function with code featured below,
function PlaceHolder = prcc(varargin)
for i = 2:nargin
if length(varargin{i})==length(varargin{i-1})
else error('all input arrays must be the same length')
end
if nargin == 0
error('No input arguments')
end
if nargin == 1
error('Not enough input arguments. Inputs must include at least two vectors')
end
if nargin > 1
for i=1:nargin
Intermediate{:,i}=[varargin{i}];
Intermediate{:,i} = sort([Intermediate{:,i}], 'descend');
end
Parameters=Intermediate(:,1:end-1);
END = Intermediate(:,end);
Intermediate={};
PRCCMatrix = [];
for i=1:nargin-1
Individuals{1,i} = Parameters(:,i);
Absences{1,i} = Parameters(:,i);
[~,~, ResidualsAbsences{1,i}] = regress(END{1,1}, cell2mat(Absences{1,i}));
[~,~, ResidualsIndividuals{1,i}] = regress(cell2mat(Individuals{1,i}), cell2mat(Absences{1,i}));
PlaceHolder{1,i} = corrcoef(ResidualsIndividuals{1,i}, ResidualsAbsences{1,i});
end
bar(diag(PRCCMatrix), 'stacked')
end
end
I am attempting to remove all instances in which corrcoef ouputs NaN. I understand that one reason for these outputs is two data points from ResidualsIndividuals and ResidualsAbsences having the same value, though I do not believe that this is the case here. I have already verified that all inputs into corrcoef are numbers instead of NaN and I am unsure where to go from here.
For reference, the data used to reproduce this error is given below:
for i=1:300
a(i,1) = unifrnd(0,1)
b(i,1) = unifrnd(0,1)
c(i,1) = unifrnd(0,1)
d(i,1) = unifrnd(0,1)
f(i,1) = unifrnd(0,1)
end
out = 2*a+3*b+4*c+5*d+6*f
prcc(a,b,c,d,f,out)
10 个评论
function PlaceHolder = prcc(varargin)
assert(nargin>1,'Not enough input arguments. Inputs must include at least two vectors')
varargin=cellfun(@(c)c(:),varargin,'uniform',0); % turn to column if not
L=cellfun(@numel,varargin); % get length of each
assert(all(L==L(1),'All input arrays must be the same length')
% finished input verification, start calculations...
I=sort(cell2mat(varargin),'descend'); % turn into array and sort
for i = 2:nargin
Parameters=Intermediate(:,1:end-1);
END = Intermediate(:,end);
Intermediate={};
PRCCMatrix = [];
for i=1:nargin-1
Individuals{1,i} = Parameters(:,i);
Absences{1,i} = Parameters(:,i);
[~,~, ResidualsAbsences{1,i}] = regress(END{1,1}, cell2mat(Absences{1,i}));
[~,~, ResidualsIndividuals{1,i}] = regress(cell2mat(Individuals{1,i}), cell2mat(Absences{1,i}));
PlaceHolder{1,i} = corrcoef(ResidualsIndividuals{1,i}, ResidualsAbsences{1,i});
end
bar(diag(PRCCMatrix), 'stacked')
end
Well, I didn't have too much to do so thought I'd try to straighten this out -- but there's a real problem when get this far --
You've got two loops nested and both using the same looping variable, i, independently. That's easily enough fixed by turning the inner into loop over j, say.
But, when that lopp is done, the variables Individuals and Absences are both the same thing identically, the content of Parameters.
This would be easier if you would just describe what it is your input vectors are and what you're trying to do with them instead of us trying to decipher nonworking code with no comments regarding its intent.
The above code is hopelessly complex and bound to be incorrect; trying to just patch it is also hopeless.
Explain what it's supposed to be doing and we can probably provide a much cleaner and working version.
As written
N=20;
a=unifrnd(0,1,N,1);
b=unifrnd(0,1,N,1);
c=unifrnd(0,1,N,1);
d=unifrnd(0,1,N,1);
f=unifrnd(0,1,N,1);
out = 2*a+3*b+4*c+5*d+6*f;
prcc(a,b,c,d,f,out);
function PlaceHolder = prcc(varargin)
for i = 2:nargin
disp("Outer "+i)
if nargin > 1
for i=1:nargin
disp("Inner "+i)
Intermediate{:,i}=[varargin{i}];
Intermediate{:,i} = sort([Intermediate{:,i}], 'descend');
end
Parameters=Intermediate(:,1:end-1);
END = Intermediate(:,end);
Intermediate={};
PRCCMatrix = [];
for i=1:nargin-1
Individuals{1,i} = Parameters(:,i);
Absences{1,i} = Parameters(:,i);
[~,~, ResidualsAbsences{1,i}] = regress(END{1,1}, cell2mat(Absences{1,i}));
[~,~, ResidualsIndividuals{1,i}] = regress(cell2mat(Individuals{1,i}), cell2mat(Absences{1,i}));
PlaceHolder{1,i} = corrcoef(ResidualsIndividuals{1,i}, ResidualsAbsences{1,i});
end
%bar(diag(PRCCMatrix), 'stacked')
end
end
end
crashes and burns...
corrcoef will NOT return a NaN unless either a variable contains NaN or one is linearly correlated with another -- hence the conclusion drawn that the two variables are identical which is supported by the fact the code assigning the two variables RHS is identically the same value; hence they've got to be the same.
Again explain what it is you're trying to compute here and can probably write efficient code to do whatever that is.
user
2022-10-9
corrcoef(rand(20,1),rand(20,1))
corrcoef(ones(20,1),ones(20,1))
corrcoef(ones(20,1),2*ones(20,1))
corrcoef(ones(20,1)+rand(20,1)/1E13,ones(20,1)+rand(20,1)/1E13)
user
2022-10-9
That should just be
x=[a,b,c,d,f,out];
[r,p]=partialcorr(x);
accounting for all crosses/effects.
z=[a c f];
[r,p]=partialcorr(x,z);
accounts for only those in z.
There's also the three-input variation that does pairwise between x and y, controlling for z
[r,p]=partialcorr(x,y,z);
See partialcorr and/or partialcorri depending upon just what it is you're actually trying to estimate; that's not at all clear from the above code alone, sorry.
NOTA BENE:
To illustrate how the two compare/differ, we'll take an example from one and compare the other way to get same result(s)
load carsmall
tCar=table(Displacement,Horsepower,Weight,MPG,Acceleration);
head(tCar)
tCar=tCar(~any(ismissing(tCar),2),:);
partialcorri([tCar.MPG,tCar.Acceleration],[tCar.Displacement,tCar.Horsepower,tCar.Weight])
partialcorr([tCar.Acceleration],[tCar.Weight],[tCar.Displacement,tCar.Horsepower])
Note the latter of Acceleration vs Weight controlling for Displacement+HP in partialcorr() produces the same result as that for partialcorri() so you can pick just which it is your actually after, but should be able to simply pull the desired elements from the array directly.
After checking the inputs as my revised code above does, it'll be easier coding to then simply do a cell2mat and refer to the variables by column index, or you could build an internal table and build column names sequentially from the number of columns input. Could also pass in the vector of coefficients for the linear combination and build it internally as well.
N=300;
a=unifrnd(0,1,N,1);
b=unifrnd(0,1,N,1);
c=unifrnd(0,1,N,1);
d=unifrnd(0,1,N,1);
f=unifrnd(0,1,N,1);
out = 2*a+3*b+4*c+5*d+6*f;
r=partialcorr([a b c d f out])
But, as shows, the out vector is defined as a linear combination of the others so it is always going to be identically correlated when computing the partial coefficients and hence the diagonal is also always going to be NaN.
OTOH, you CAN compute a simple correlation between each factor and the combination that is nonzero and finite given the random nature of the independent sampling between terms.
corr(a,out)
corr(b,out)
corr(c,out)
corr(d,out)
corr(f,out)
The later would be easier to code if used arrays instead of sequentially-name variables, of course.
Adam Danz
2022-10-10
I haven't read the other comments here so my appologies if this is redundant. Another reason that NaNs could be in correlation values is when there are NaNs in the original data.
Replated threads:
- https://www.mathworks.com/matlabcentral/answers/506464-getting-a-nan-in-correlation-coefficient#answer_416400
- https://www.mathworks.com/matlabcentral/answers/285584-getting-nan-when-computing-partialcorr-no-nans-in-data#answer_691965
dpb
2022-10-10
Same issue going on here as in the latter of the above two -- OP is creating a result "variable" that is, by definition, a linear combination of the other variables and so when computes the partial correlations controlling for all the other variables, his system is also rank deficient and the result is the correct and expected one for the case; simply cannot be otherwise.
回答(0 个)
类别
在 帮助中心 和 File Exchange 中查找有关 Hypothesis Tests 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!