How to alter given code to prevent/remove 'NaN' output from corrcoef function

I am currently writing a function with code featured below,
function PlaceHolder = prcc(varargin)
for i = 2:nargin
if length(varargin{i})==length(varargin{i-1})
else error('all input arrays must be the same length')
end
if nargin == 0
error('No input arguments')
end
if nargin == 1
error('Not enough input arguments. Inputs must include at least two vectors')
end
if nargin > 1
for i=1:nargin
Intermediate{:,i}=[varargin{i}];
Intermediate{:,i} = sort([Intermediate{:,i}], 'descend');
end
Parameters=Intermediate(:,1:end-1);
END = Intermediate(:,end);
Intermediate={};
PRCCMatrix = [];
for i=1:nargin-1
Individuals{1,i} = Parameters(:,i);
Absences{1,i} = Parameters(:,i);
[~,~, ResidualsAbsences{1,i}] = regress(END{1,1}, cell2mat(Absences{1,i}));
[~,~, ResidualsIndividuals{1,i}] = regress(cell2mat(Individuals{1,i}), cell2mat(Absences{1,i}));
PlaceHolder{1,i} = corrcoef(ResidualsIndividuals{1,i}, ResidualsAbsences{1,i});
end
bar(diag(PRCCMatrix), 'stacked')
end
end
I am attempting to remove all instances in which corrcoef ouputs NaN. I understand that one reason for these outputs is two data points from ResidualsIndividuals and ResidualsAbsences having the same value, though I do not believe that this is the case here. I have already verified that all inputs into corrcoef are numbers instead of NaN and I am unsure where to go from here.
For reference, the data used to reproduce this error is given below:
for i=1:300
a(i,1) = unifrnd(0,1)
b(i,1) = unifrnd(0,1)
c(i,1) = unifrnd(0,1)
d(i,1) = unifrnd(0,1)
f(i,1) = unifrnd(0,1)
end
out = 2*a+3*b+4*c+5*d+6*f
prcc(a,b,c,d,f,out)

10 个评论

function PlaceHolder = prcc(varargin)
assert(nargin>1,'Not enough input arguments. Inputs must include at least two vectors')
varargin=cellfun(@(c)c(:),varargin,'uniform',0); % turn to column if not
L=cellfun(@numel,varargin); % get length of each
assert(all(L==L(1),'All input arrays must be the same length')
% finished input verification, start calculations...
I=sort(cell2mat(varargin),'descend'); % turn into array and sort
for i = 2:nargin
Parameters=Intermediate(:,1:end-1);
END = Intermediate(:,end);
Intermediate={};
PRCCMatrix = [];
for i=1:nargin-1
Individuals{1,i} = Parameters(:,i);
Absences{1,i} = Parameters(:,i);
[~,~, ResidualsAbsences{1,i}] = regress(END{1,1}, cell2mat(Absences{1,i}));
[~,~, ResidualsIndividuals{1,i}] = regress(cell2mat(Individuals{1,i}), cell2mat(Absences{1,i}));
PlaceHolder{1,i} = corrcoef(ResidualsIndividuals{1,i}, ResidualsAbsences{1,i});
end
bar(diag(PRCCMatrix), 'stacked')
end
Well, I didn't have too much to do so thought I'd try to straighten this out -- but there's a real problem when get this far --
You've got two loops nested and both using the same looping variable, i, independently. That's easily enough fixed by turning the inner into loop over j, say.
But, when that lopp is done, the variables Individuals and Absences are both the same thing identically, the content of Parameters.
This would be easier if you would just describe what it is your input vectors are and what you're trying to do with them instead of us trying to decipher nonworking code with no comments regarding its intent.
Apologies for any lack of clarity. I am attempting to run the code above for the function's input given at the bottom. Upon execution of this code however, the function's output, PlaceHolder, contains 'NaN' in some of the entries. I am hoping to fix this. Indidivuals and Absences will not be the same thing. Please let me know if you have any more questions. The majority of the function is not direclty relevant to the problem, and is included only to allow for reproduction of my issue. This NaN output is occuring as a result of the corrcoef function, but I do not know how to remedy it.
for i=1:300
a(i,1) = unifrnd(0,1)
b(i,1) = unifrnd(0,1)
c(i,1) = unifrnd(0,1)
d(i,1) = unifrnd(0,1)
f(i,1) = unifrnd(0,1)
end
out = 2*a+3*b+4*c+5*d+6*f
prcc(a,b,c,d,f,out)
The above code is hopelessly complex and bound to be incorrect; trying to just patch it is also hopeless.
Explain what it's supposed to be doing and we can probably provide a much cleaner and working version.
As written
N=20;
a=unifrnd(0,1,N,1);
b=unifrnd(0,1,N,1);
c=unifrnd(0,1,N,1);
d=unifrnd(0,1,N,1);
f=unifrnd(0,1,N,1);
out = 2*a+3*b+4*c+5*d+6*f;
prcc(a,b,c,d,f,out);
Outer 2 Inner 1 Inner 2 Inner 3 Inner 4 Inner 5 Inner 6 Outer 3 Inner 1
Assigning to 0 elements using a simple assignment statement is not supported. Consider using comma-separated list assignment.

Error in solution>prcc (line 17)
Intermediate{:,i}=[varargin{i}];
function PlaceHolder = prcc(varargin)
for i = 2:nargin
disp("Outer "+i)
if nargin > 1
for i=1:nargin
disp("Inner "+i)
Intermediate{:,i}=[varargin{i}];
Intermediate{:,i} = sort([Intermediate{:,i}], 'descend');
end
Parameters=Intermediate(:,1:end-1);
END = Intermediate(:,end);
Intermediate={};
PRCCMatrix = [];
for i=1:nargin-1
Individuals{1,i} = Parameters(:,i);
Absences{1,i} = Parameters(:,i);
[~,~, ResidualsAbsences{1,i}] = regress(END{1,1}, cell2mat(Absences{1,i}));
[~,~, ResidualsIndividuals{1,i}] = regress(cell2mat(Individuals{1,i}), cell2mat(Absences{1,i}));
PlaceHolder{1,i} = corrcoef(ResidualsIndividuals{1,i}, ResidualsAbsences{1,i});
end
%bar(diag(PRCCMatrix), 'stacked')
end
end
end
crashes and burns...
corrcoef will NOT return a NaN unless either a variable contains NaN or one is linearly correlated with another -- hence the conclusion drawn that the two variables are identical which is supported by the fact the code assigning the two variables RHS is identically the same value; hence they've got to be the same.
Again explain what it is you're trying to compute here and can probably write efficient code to do whatever that is.
corrcoef(rand(20,1),rand(20,1))
ans = 2×2
1.0000 0.1436 0.1436 1.0000
corrcoef(ones(20,1),ones(20,1))
ans = 2×2
NaN NaN NaN NaN
corrcoef(ones(20,1),2*ones(20,1))
ans = 2×2
NaN NaN NaN NaN
corrcoef(ones(20,1)+rand(20,1)/1E13,ones(20,1)+rand(20,1)/1E13)
ans = 2×2
1.0000 0.0166 0.0166 1.0000
I am attempting to write a function which returns the partial rank correlation coeffiicient for each parameter (column) of the input matrix, where its last column, out, is the result of the equation, 2*a+3*b+4*c+5*d+6*f, to be used in the calculation of this prcc.
That should just be
x=[a,b,c,d,f,out];
[r,p]=partialcorr(x);
accounting for all crosses/effects.
z=[a c f];
[r,p]=partialcorr(x,z);
accounts for only those in z.
There's also the three-input variation that does pairwise between x and y, controlling for z
[r,p]=partialcorr(x,y,z);
See partialcorr and/or partialcorri depending upon just what it is you're actually trying to estimate; that's not at all clear from the above code alone, sorry.
NOTA BENE:
To illustrate how the two compare/differ, we'll take an example from one and compare the other way to get same result(s)
load carsmall
tCar=table(Displacement,Horsepower,Weight,MPG,Acceleration);
head(tCar)
Displacement Horsepower Weight MPG Acceleration ____________ __________ ______ ___ ____________ 307 130 3504 18 12 350 165 3693 15 11.5 318 150 3436 18 11 304 150 3433 16 12 302 140 3449 17 10.5 429 198 4341 15 10 454 220 4354 14 9 440 215 4312 14 8.5
tCar=tCar(~any(ismissing(tCar),2),:);
partialcorri([tCar.MPG,tCar.Acceleration],[tCar.Displacement,tCar.Horsepower,tCar.Weight])
ans = 2×3
-0.0537 -0.1520 -0.4856 -0.3994 -0.4008 0.4912
partialcorr([tCar.Acceleration],[tCar.Weight],[tCar.Displacement,tCar.Horsepower])
ans = 0.4912
Note the latter of Acceleration vs Weight controlling for Displacement+HP in partialcorr() produces the same result as that for partialcorri() so you can pick just which it is your actually after, but should be able to simply pull the desired elements from the array directly.
After checking the inputs as my revised code above does, it'll be easier coding to then simply do a cell2mat and refer to the variables by column index, or you could build an internal table and build column names sequentially from the number of columns input. Could also pass in the vector of coefficients for the linear combination and build it internally as well.
N=300;
a=unifrnd(0,1,N,1);
b=unifrnd(0,1,N,1);
c=unifrnd(0,1,N,1);
d=unifrnd(0,1,N,1);
f=unifrnd(0,1,N,1);
out = 2*a+3*b+4*c+5*d+6*f;
r=partialcorr([a b c d f out])
r = 6×6
NaN -1.0000 -1.0000 -1.0000 -1.0000 1.0000 -1.0000 NaN -1.0000 -1.0000 -1.0000 1.0000 -1.0000 -1.0000 NaN -1.0000 -1.0000 1.0000 -1.0000 -1.0000 -1.0000 NaN -1.0000 1.0000 -1.0000 -1.0000 -1.0000 -1.0000 NaN 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 NaN
But, as shows, the out vector is defined as a linear combination of the others so it is always going to be identically correlated when computing the partial coefficients and hence the diagonal is also always going to be NaN.
OTOH, you CAN compute a simple correlation between each factor and the combination that is nonzero and finite given the random nature of the independent sampling between terms.
corr(a,out)
ans = 0.2843
corr(b,out)
ans = 0.3001
corr(c,out)
ans = 0.4313
corr(d,out)
ans = 0.5070
corr(f,out)
ans = 0.6443
The later would be easier to code if used arrays instead of sequentially-name variables, of course.
Same issue going on here as in the latter of the above two -- OP is creating a result "variable" that is, by definition, a linear combination of the other variables and so when computes the partial correlations controlling for all the other variables, his system is also rank deficient and the result is the correct and expected one for the case; simply cannot be otherwise.

请先登录,再进行评论。

回答(0 个)

产品

版本

R2022a

提问:

2022-10-9

评论:

dpb
2022-10-10

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by