how to calculate singular values in collin test to detect multicollinearity

11 次查看(过去 30 天)
I am a bit confused as to how we find the singular values and therefore condition index number to detect multicollinearity in multiple linear regression analysis. Some mathematicians say the singular values are the square roots of the eigenvalues of the correlation matrix of the predictors of a model. While others says we use covariance matrix instead. Again some math publications said The singular values are the square roots of the eigenvalues of the square matrix X'X of multiple linear regression model. Then I treid using all three methods but when I cross checked with MATLAB results using collintest it does not match with either of my calculations. it does not explain how we go the output. can someone explain it to me?
  5 个评论
Umar
Umar 2024-6-30
Hi Nafisa, Glad to help, to answer your question regarding Belsely collinearity diagnostics, we have to understand the concept of comparing the singular values obtained from SVD (Singular Value Decomposition) with those from the Belsley collinearity diagnostics in Matlab, differences may arise due to the nature of the methods. SVD directly computes the singular values of a matrix, while collinearity diagnostics like the collin test in Matlab focus on assessing multicollinearity in regression models rather than directly computing singular values. It's essential to understand the specific purpose and methodology of each approach to interpret the results correctly. If you seek singular values, SVD is the appropriate method, whereas collinearity diagnostics are more suitable for assessing multicollinearity in regression analysis.
Hope this help clarifies to resolve your problem.
N/A
N/A 2024-6-30
Below I have attached a dataset of boston house prices and when I tried calculating condition indices for each singular value, dividing the largest singular value by each of the singular values individually using your code. However I am getting different answer as to real one. I have also attached the example page. https://stataiml.com/posts/42_condition_index_r/#input-dataset

请先登录,再进行评论。

采纳的回答

Umar
Umar 2024-6-30
Hi Nafisa,
In order to help you further with your problem, can you share the matlab code which is causing error, I have to review it in order to share my detailed thoughts. Hope that should not be a problem.
  1 个评论
N/A
N/A 2024-6-30
Absolutely,
data = readtable('boston house prices.xlsx', 'VariableNamingRule','preserve');
x1 = data.CRIM;
x2 = data.ZN;
x3 = data.INDUS;
x4 = data.CHAS;
x5 = data.NOX;
x6 = data.RM;
x7 = data.AGE;
x8 = data.DIS;
x9 = data.RAD;
x10 = data.TAX;
x11 = data.PTRATIO;
x12 = data.B;
x13 = data.LSTAT;
Predictors = [x1, x2, x3, x4, x5, x6, x7, x8, x9, x10, x11, x12, x13];
[U, S, V] = svd(Predictors); % Singular value decomposition
singular_values = diag(S) % Extract singular values
condition_index = (max(singular_values) ./ singular_values)

请先登录,再进行评论。

更多回答(4 个)

Umar
Umar 2024-6-30
Hi Nafisa,
Due to incorrect calculation of the condition indices formula. The condition index should be the ratio of the largest singular value to the smallest singular value, not to each singular value individually. So, in order to correctly calculate the condition indices, modify the calculation as follows:
condition_indices = singular_values(1) ./ singular_values;
By modifying it will ensure that the largest singular value divided by each singular value individually, which gives us the correct condition indices for the predictors.
Hope this will help resolve your problem.

Paul
Paul 2024-6-30
编辑:Paul 2024-6-30
Hi NAFISA,
Using the example from the the doc collintest
load Data_Canada
Output from collintest
[sValues,condInx] = collintest(Data);
Variance Decomposition sValue condIdx Var1 Var2 Var3 Var4 Var5 --------------------------------------------------------- 2.1748 1 0.0012 0.0018 0.0003 0.0000 0.0001 0.4789 4.5413 0.0261 0.0806 0.0035 0.0006 0.0012 0.1602 13.5795 0.3386 0.3802 0.0811 0.0011 0.0137 0.1211 17.9617 0.6138 0.5276 0.1918 0.0004 0.0193 0.0248 87.8245 0.0202 0.0099 0.7233 0.9979 0.9658
The same output for the sValue and condIndx can be found by (not including various error checks)...
Scale Data so that each column has unit magnitude
sData = Data./vecnorm(Data,2,1);
Take the svd of the scaled data.
[~,S,V] = svd(sData);
S = diag(S);
Compute the indices
idx = max(S)./S;
table(S,idx)
ans = 5x2 table
S idx ________ ______ 2.1748 1 0.47889 4.5413 0.16015 13.579 0.12108 17.962 0.024763 87.825
Hopefully that makes sense in the context of what collintest is supposed to do. I have zero knowledge of that function.
  7 个评论
N/A
N/A 2024-7-11
Hello Paul, I need to understand theoreteically how it is calculated. I did not quite inderstand when you said doc age

请先登录,再进行评论。


Umar
Umar 2024-7-3
Hi Nafisa,
You asked how do I calculate the singular values from collin test and from svd function theoretically.
To answer this question, regarding collinearity test, you can calculate singular values by examining the condition number of a matrix. The condition number is the ratio of the largest to the smallest singular value. Higher condition numbers indicate a higher degree of collinearity. While using the Singular Value Decomposition (SVD) function in Matlab, you can directly compute the singular values of a matrix.
  1 个评论
N/A
N/A 2024-7-3
Hello Umar, I did not quite understand when you said by examining the condition number of a matrix. It would be better if you can provide a formula or something. Also you said that While using the Singular Value Decomposition (SVD) function in Matlab, I can directly compute the singular values of a matrix.To compute the singular values I need to have a square matrix so I tried finding the eigenvalues of X'X where X isthe design matrix and when I did that I am getting different eigenvalues.

请先登录,再进行评论。


Umar
Umar 2024-7-3
Hi Nafisa,
Glad to hear back from you. Please see my answers to your comments below.
Comment#1:did not quite understand when you said by examining the condition number of a matrix. It would be better if you can provide a formula or something.
Answer:The condition number of a matrix measures its sensitivity to changes in input, where a high condition number indicates potential numerical instability. In Matlab, you can compute the condition number of a matrix A using the cond() function: cond(A).The condition number of a matrix indicates how sensitive the matrix is to changes in its input values.Here is an example demonstrating how to compute the condition number of a matrix A in Matlab by creating a 2x2 matrix A and then use the cond() function to calculate its condition number. The result is displayed using disp().
>> % Define a matrix A A = [1, 2; 3, 4];
>> % Calculate the condition number of matrix A condition_number = cond(A);
>> disp(['The condition number of matrix A is: ', num2str(condition_number)]);
This information is valuable in assessing the stability and accuracy of numerical computations involving the matrix A.
Comment#2:Also you said that While using the Singular Value Decomposition (SVD) function in Matlab, I can directly compute the singular values of a matrix
Answer: Yes, you can directly compute the singular values of a matrix A using the svd() function: [U, S, V] = svd(A). Ensure that A is a square matrix for SVD computation.
Comment#3: To compute the singular values I need to have a square matrix so I tried finding the eigenvalues of X'X where X isthe design matrix and when I did that I am getting different eigenvalues.
Answer: When finding the eigenvalues of X'X, ensure that X is properly defined and that X'X results in a square matrix. To compute eigenvalues, you can use the eig() function: eigenvalues = eig(X' * X). Ensure that X is correctly defined to match the expected behavior.

类别

Help CenterFile Exchange 中查找有关 Linear Algebra 的更多信息

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by