How to alter given code to prevent/remove 'NaN' output from corrcoef function

Question

0 个投票

I am currently writing a function with code featured below,

function PlaceHolder = prcc(varargin)
for i = 2:nargin
    if length(varargin{i})==length(varargin{i-1})
    else error('all input arrays must be the same length')
    end
    if nargin == 0
        error('No input arguments')
    end
   
    if nargin == 1
        error('Not enough input arguments. Inputs must include at least two vectors')
    end
    if nargin > 1
        for i=1:nargin
            Intermediate{:,i}=[varargin{i}];
            Intermediate{:,i} = sort([Intermediate{:,i}], 'descend');
        end   
        Parameters=Intermediate(:,1:end-1);
        END = Intermediate(:,end);
        Intermediate={};
        PRCCMatrix = [];
            for i=1:nargin-1
                Individuals{1,i} = Parameters(:,i);
                Absences{1,i} = Parameters(:,i);
                [~,~, ResidualsAbsences{1,i}] = regress(END{1,1}, cell2mat(Absences{1,i}));
                [~,~, ResidualsIndividuals{1,i}] = regress(cell2mat(Individuals{1,i}), cell2mat(Absences{1,i}));
                PlaceHolder{1,i} = corrcoef(ResidualsIndividuals{1,i}, ResidualsAbsences{1,i});
            end
    bar(diag(PRCCMatrix), 'stacked')
    end
end

I am attempting to remove all instances in which corrcoef ouputs NaN. I understand that one reason for these outputs is two data points from ResidualsIndividuals and ResidualsAbsences having the same value, though I do not believe that this is the case here. I have already verified that all inputs into corrcoef are numbers instead of NaN and I am unsure where to go from here.

For reference, the data used to reproduce this error is given below:

for i=1:300
    a(i,1) = unifrnd(0,1)
    b(i,1) = unifrnd(0,1)
    c(i,1) = unifrnd(0,1)
    d(i,1) = unifrnd(0,1) 
    f(i,1) = unifrnd(0,1)
end
out = 2*a+3*b+4*c+5*d+6*f
prcc(a,b,c,d,f,out)

10 个评论
显示 8更早的评论隐藏 8更早的评论

dpb 2022-10-9

编辑：dpb 2022-10-9

在 MATLAB Online 中打开

function PlaceHolder = prcc(varargin)
  assert(nargin>1,'Not enough input arguments. Inputs must include at least two vectors')
  varargin=cellfun(@(c)c(:),varargin,'uniform',0);      % turn to column if not
  L=cellfun(@numel,varargin);                           % get length of each
  assert(all(L==L(1),'All input arrays must be the same length')
  % finished input verification, start calculations...
  I=sort(cell2mat(varargin),'descend');                 % turn into array and sort
  
for i = 2:nargin
        Parameters=Intermediate(:,1:end-1);
        END = Intermediate(:,end);
        Intermediate={};
        PRCCMatrix = [];
        
        for i=1:nargin-1
            Individuals{1,i} = Parameters(:,i);
            Absences{1,i} = Parameters(:,i);
            [~,~, ResidualsAbsences{1,i}] = regress(END{1,1}, cell2mat(Absences{1,i}));
            [~,~, ResidualsIndividuals{1,i}] = regress(cell2mat(Individuals{1,i}), cell2mat(Absences{1,i}));
            PlaceHolder{1,i} = corrcoef(ResidualsIndividuals{1,i}, ResidualsAbsences{1,i});
        end
        bar(diag(PRCCMatrix), 'stacked')
end

Well, I didn't have too much to do so thought I'd try to straighten this out -- but there's a real problem when get this far --

You've got two loops nested and both using the same looping variable, i, independently. That's easily enough fixed by turning the inner into loop over j, say.

But, when that lopp is done, the variables Individuals and Absences are both the same thing identically, the content of Parameters.

This would be easier if you would just describe what it is your input vectors are and what you're trying to do with them instead of us trying to decipher nonworking code with no comments regarding its intent.

dpb 2022-10-9

编辑：dpb 2022-10-9

在 MATLAB Online 中打开

The above code is hopelessly complex and bound to be incorrect; trying to just patch it is also hopeless.

Explain what it's supposed to be doing and we can probably provide a much cleaner and working version.

As written

N=20;
a=unifrnd(0,1,N,1);
b=unifrnd(0,1,N,1);
c=unifrnd(0,1,N,1);
d=unifrnd(0,1,N,1);
f=unifrnd(0,1,N,1);
out = 2*a+3*b+4*c+5*d+6*f;
prcc(a,b,c,d,f,out);
Outer 2
Inner 1
Inner 2
Inner 3
Inner 4
Inner 5
Inner 6
Outer 3
Inner 1
Assigning to 0 elements using a simple assignment statement is not supported. Consider using comma-separated list assignment.

Error in solution>prcc (line 17)
            Intermediate{:,i}=[varargin{i}];
function PlaceHolder = prcc(varargin)
for i = 2:nargin
    disp("Outer "+i)
    if nargin > 1
        for i=1:nargin
            disp("Inner "+i)
            Intermediate{:,i}=[varargin{i}];
            Intermediate{:,i} = sort([Intermediate{:,i}], 'descend');
        end   
        Parameters=Intermediate(:,1:end-1);
        END = Intermediate(:,end);
        Intermediate={};
        PRCCMatrix = [];
            for i=1:nargin-1
                Individuals{1,i} = Parameters(:,i);
                Absences{1,i} = Parameters(:,i);
                [~,~, ResidualsAbsences{1,i}] = regress(END{1,1}, cell2mat(Absences{1,i}));
                [~,~, ResidualsIndividuals{1,i}] = regress(cell2mat(Individuals{1,i}), cell2mat(Absences{1,i}));
                PlaceHolder{1,i} = corrcoef(ResidualsIndividuals{1,i}, ResidualsAbsences{1,i});
            end
    %bar(diag(PRCCMatrix), 'stacked')
    end
end
end

crashes and burns...

corrcoef will NOT return a NaN unless either a variable contains NaN or one is linearly correlated with another -- hence the conclusion drawn that the two variables are identical which is supported by the fact the code assigning the two variables RHS is identically the same value; hence they've got to be the same.

Again explain what it is you're trying to compute here and can probably write efficient code to do whatever that is.

dpb 2022-10-9

编辑：dpb 2022-10-9

在 MATLAB Online 中打开

That should just be

x=[a,b,c,d,f,out];
[r,p]=partialcorr(x);

accounting for all crosses/effects.

z=[a c f];
[r,p]=partialcorr(x,z);

accounts for only those in z.

There's also the three-input variation that does pairwise between x and y, controlling for z

[r,p]=partialcorr(x,y,z);

See partialcorr and/or partialcorri depending upon just what it is you're actually trying to estimate; that's not at all clear from the above code alone, sorry.

NOTA BENE:

To illustrate how the two compare/differ, we'll take an example from one and compare the other way to get same result(s)

load carsmall
tCar=table(Displacement,Horsepower,Weight,MPG,Acceleration);
head(tCar)
    Displacement    Horsepower    Weight    MPG    Acceleration
    ____________    __________    ______    ___    ____________

        307            130         3504     18           12    
        350            165         3693     15         11.5    
        318            150         3436     18           11    
        304            150         3433     16           12    
        302            140         3449     17         10.5    
        429            198         4341     15           10    
        454            220         4354     14            9    
        440            215         4312     14          8.5    
tCar=tCar(~any(ismissing(tCar),2),:);
partialcorri([tCar.MPG,tCar.Acceleration],[tCar.Displacement,tCar.Horsepower,tCar.Weight])
ans = 2×3
   -0.0537   -0.1520   -0.4856
   -0.3994   -0.4008    0.4912
partialcorr([tCar.Acceleration],[tCar.Weight],[tCar.Displacement,tCar.Horsepower])
ans = 0.4912

Note the latter of Acceleration vs Weight controlling for Displacement+HP in partialcorr() produces the same result as that for partialcorri() so you can pick just which it is your actually after, but should be able to simply pull the desired elements from the array directly.

After checking the inputs as my revised code above does, it'll be easier coding to then simply do a cell2mat and refer to the variables by column index, or you could build an internal table and build column names sequentially from the number of columns input. Could also pass in the vector of coefficients for the linear combination and build it internally as well.

dpb 2022-10-9

在 MATLAB Online 中打开

N=300;
a=unifrnd(0,1,N,1);
b=unifrnd(0,1,N,1);
c=unifrnd(0,1,N,1);
d=unifrnd(0,1,N,1);
f=unifrnd(0,1,N,1);
out = 2*a+3*b+4*c+5*d+6*f;
r=partialcorr([a b c d f out])
r = 6×6
       NaN   -1.0000   -1.0000   -1.0000   -1.0000    1.0000
   -1.0000       NaN   -1.0000   -1.0000   -1.0000    1.0000
   -1.0000   -1.0000       NaN   -1.0000   -1.0000    1.0000
   -1.0000   -1.0000   -1.0000       NaN   -1.0000    1.0000
   -1.0000   -1.0000   -1.0000   -1.0000       NaN    1.0000
    1.0000    1.0000    1.0000    1.0000    1.0000       NaN

But, as shows, the out vector is defined as a linear combination of the others so it is always going to be identically correlated when computing the partial coefficients and hence the diagonal is also always going to be NaN.

OTOH, you CAN compute a simple correlation between each factor and the combination that is nonzero and finite given the random nature of the independent sampling between terms.

corr(a,out)
ans = 0.2843
corr(b,out)
ans = 0.3001
corr(c,out)
ans = 0.4313
corr(d,out)
ans = 0.5070
corr(f,out)
ans = 0.6443

The later would be easier to code if used arrays instead of sequentially-name variables, of course.

Adam Danz 2022-10-10

I haven't read the other comments here so my appologies if this is redundant. Another reason that NaNs could be in correlation values is when there are NaNs in the original data.

Replated threads:

dpb 2022-10-10

Same issue going on here as in the latter of the above two -- OP is creating a result "variable" that is, by definition, a linear combination of the other variables and so when computes the partial correlations controlling for all the other variables, his system is also rank deficient and the result is the correct and expected one for the case; simply cannot be otherwise.

请先登录，再进行评论。

请先登录，再回答此问题。

Follow Question

How to alter given code to prevent/remove 'NaN' output from corrcoef function

10 个评论
显示 8更早的评论隐藏 8更早的评论

回答（0 个）

类别

产品

版本

标签

Community Treasure Hunt

How to alter given code to prevent/remove 'NaN' output from corrcoef function

10 个评论 显示 8更早的评论 隐藏 8更早的评论

回答（0 个）

类别

产品

版本

标签

另请参阅

Community Treasure Hunt

10 个评论
显示 8更早的评论隐藏 8更早的评论