How to find the correlation of rows across 2 tables?

13 次查看(过去 30 天)
I have an excel with 2 tables — d1 and d2. Each has the first column as State and the subsequent columns as chronological months (January, February, March, ... December) from 2000–2020 (so the months repeat). I want to find the correlations of each month (i.e., all Januarys from 2000–2020, all Februarys from 2000–2020, all Marchs from 2000–2020, ..., all Decembers from 2000–2020) for each state across d1 and d2.
My initial idea was to iterate through the first row of d1 and d2 and then concatenate all the January values (using column looping), find the January correlation across d1 and d2; then concatenate all the February values, find the February correlation; then concatenate all the March values, find the March correlation and so on with the other months for that first row. The same would be applied for the rest of the 10 rows. Given how disjointed the month columns are and that I have to work with 2 tables, I'm having trouble coming up with a code that does what I'm looking for. Any idea how to work through this? Thank you!

采纳的回答

Dave B
Dave B 2021-11-14
I think you're saying you want a correlation for each state-month? Here's a simple approach that uses a nested loop. You could do something a little fancier with splitapply, but I think it would just be more cryptic.
Note that you have a lot of all-zero-values in your data, and taking correlations where one variable has no variance will produce NaN. In fact there wasn't much variation past Jan 2000...?
d1=readcell('https://www.mathworks.com/matlabcentral/answers/uploaded_files/799494/Book2.xlsx','Sheet','d1');
d2=readcell('https://www.mathworks.com/matlabcentral/answers/uploaded_files/799484/Book2.xlsx','Sheet','d2');
states=string(d1(2:end,1));
dates=cellfun(@(x)x,d1(1,2:end)); % hack for the fact that this can't just be cell2mat'ed.
d1=cell2mat(d1(2:end,2:end)); % trim off d1
d2=cell2mat(d2(2:end,2:end)); % trim off d2. This assumes that states and dates will match up exactly
corrs=nan(numel(states),12);
for i = 1:numel(states)
for j = 1:12
corrs(i,j)=corr(d1(states==states(i),dates.Month==j)',d2(states==states(i),dates.Month==j)');
end
end
[m,s]=meshgrid(month(datetime(2000,1:12,1),'name'),states);
t=table(s(:),string(m(:)),corrs(:),'VariableNames',["State" "Month" "Rho"])
t = 132×3 table
State Month Rho _____ __________ _________ "ME" "January" 0.99136 "VT" "January" NaN "NH" "January" 0.12738 "NY" "January" NaN "MA" "January" 0.15173 "RI" "January" -0.022711 "CT" "January" -0.049358 "PA" "January" 0.042177 "NJ" "January" 0.22731 "MD" "January" -0.050183 "DE" "January" -0.06616 "ME" "February" NaN "VT" "February" NaN "NH" "February" NaN "NY" "February" NaN "MA" "February" 1
  4 个评论
Rabeca Mohammed
Rabeca Mohammed 2021-11-14
Can't thank you enough!! Could you explain the loop and what each line of code in it does? Also, what is [m,s] and why does t have so many parameters?
I'm a beginner at MATLAB so all this seems quite overwhelming. Thank you!
Dave B
Dave B 2021-11-14
@Rabeca Mohammed - I'm glad this was helpful! Here's a walkthrough for this code.
Learning to write code can be scary at first, but remember that things that are totally overwhelming when you first start will be totally trivial once you've got some experience. Stick with it, you can do it!
When you're in doubt, leave a semi-colon off the end of a line to see what's in there, break a complicated statement out into its parts, or use disp to see what's in a variable. A nice thing about programming in MATLAB is (as long as you're not doing things like writing files) it's hard to do any serious damage, so experiment!
Here was my thought process:
First thought: can I easily transform the data so that it's a matrix with a column for each state/month? If so, I could just call corr once on that matrix, as corr works quite well with matrices (like most MATLAB functions). I could see how to do that, but not without the same loop we have here or a pretty advanced maneuver. And I suspected you didn't want an advanced maneuver. So I said to myself "we're going to want to do an action for each state and each month. I better import the data in a way that I have all those states and months."
d1=readcell('https://www.mathworks.com/matlabcentral/answers/uploaded_files/799494/Book2.xlsx','Sheet','d1');
d2=readcell('https://www.mathworks.com/matlabcentral/answers/uploaded_files/799484/Book2.xlsx','Sheet','d2');
When I import data from excel, I normally use readtable. But that would to be awkward here because we have rows and columns both serving as indices. readmatrix would work well for the numeric data, but it wouldn't do so well for the state names (readmatrix means everything has to be numeric). I think readcell is the last option in the "read" family.
states=string(d1(2:end,1));
This one is pretty easy, it grabs the first column (skipping the first row), and converts the "cell array of character vectors" (often called cellstrs) to strings. cellstrs were an old way of working with text and strings are much better, so if I'm going to do anything at all with text I almost always try to switch to string.
dates=cellfun(@(x)x,d1(1,2:end)); % hack for the fact that this can't just be cell2mat'ed.
This line is a total pain. It comes from the fact that each date is in a little container called a cell, and I wanted to convert it to a regular non-cell vector. It runs a function that says for each cell take the value and pull it out of the cell.
d1=cell2mat(d1(2:end,2:end)); % trim off d1
d2=cell2mat(d2(2:end,2:end)); % trim off d2. This assumes that states and dates will match up exactly
These next two lines just take the remaining data and plop it into a matrix.
Now the thought process is: I'm going to get a value for each state/month combination. How do I want to organize the output? I could have done a row for each state/month (like we ended up with for the output), but it's natural for me when there are two dimensions to think of them as rows and columns (not unlike the organization of the spreadsheet).
corrs = nan(numel(states),12);
This line initializes a matrix with rows and columns that are the number of states and number of months. It puts a placeholder 'NaN' (or not-a-number) in each location in the matrix.
for i = 1:numel(states)
The outer loop will iterate over states (techinically it will iterate from the number 1 to the the number of states)
for j = 1:12
The inner loop will iterate over months.
When you run the code, the first case inside the inner loop i is 1 and j is 1, then i is 1 and j is 2, etc. until j reaches 12. When j reaches 12, MATLAB exits the inner loop and i increments to 2. So i is 2 and j is 1, then i is 2 and j is 2, etc. To visualize this process in action, stick disp([i j]) into the inner loop.
corrs(i,j)=corr(d1(states==states(i),dates.Month==j)',d2(states==states(i),dates.Month==j)');
Here's the action of course.
corrs(i,j) says store the result in the matrix corrs, row i column j.
We're going to correlate something in d1 and something in d2. We want a subset of the data: the row with the current state and the columns with the current month. So it's d1, for the rows where the states vector is equal to states(i) and the columns where the dates vector's month is equal to the number j. And then d2 with the same. To visualize this, look at what happens if you set i=1 and then run states==states(i). MATLAB makes a vector of 0s and 1s which we call a logical index, it's a sort-of mask which gives us a specific row (or set of rows in the more general case).
Last bit in this line: there's a little ' on the end. That's because we'll end up with a row, and the corr function will perform one correlation for each column, so the ' (transpose) will turn a row into a column.
[m,s]=meshgrid(month(datetime(2000,1:12,1),'name'),states);
Now the data part is done, all the correlations have been computed. But while a matrix is natural for me, it's not for everyone. So I figured I want to stick it in a table with each row referring to a separate correlation. That's a pretty arbitrary step. meshgrid is basically going to make a matrix, that's the same shape as corrs, with the labels for each month and state. If you leave off the semicolon you'll see how this works. The first argument where I put month(datetime(2000,1:12,1) just takes advantage of the month function in MATLAB to get the names of months. We could have done ["January" "February" ... instead.
Last line!
t=table(s(:),string(m(:)),corrs(:),'VariableNames',["State" "Month" "Rho"])
The colon operator (:) will turn a matrix into a column. Because all three of these matrices are organized the same way, they all end up with matching columns. I added a string to m, because month() returns a cellstr and (see above) I like strings better.

请先登录,再进行评论。

更多回答(0 个)

类别

Help CenterFile Exchange 中查找有关 Data Type Conversion 的更多信息

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by