- Currently, you have one row in your table for each code/fruit pair.
- Instead, you one row for each code
- The columns are "1st fruit", "2nd fruit", "3rd fruit"
- In any row, you want the list of the fruits for that code
- If the code doesn't have 3 fruits, have an empty entry in the table
use unstack to reshape table with dummy variable (edited: alternative crosstab method)
3 次查看(过去 30 天)
显示 更早的评论
After I sort table according to a group variable, I like to put the group value as the 'header' column and put its members in the other columns in the same row.
Data for the sake of demostration:
C = {'q1', 'q1', 'q2', 'q3', 'q3', 'q3';
'apple', 'appl', 'banana', 'orange', 'orang', 'orange'}';
T = cell2table(C, 'VariableNames', {'code', 'fruit'});
[GroupsID, Groups] = findgroups(T.code);
unique_groupID = unique(GroupsID);
gT = table('Size', [10,4], 'VariableTypes', {'string', 'string', 'string', 'string'});
Method 1. (edited.) brutal for-loop that I don't like, and its result needs more processing to remove redundancy in each row.
for k=1:size(unique_groupID)
% extract group elements from 'fruit'
tmp = T.fruit(GroupsID==unique_groupID(k));
l = size(tmp',2);
gT(k,1) = {Groups(k)};
gT(k,2:l+1) = tmp';
end
rmmissing(gT, "MinNumMissing", 3)
Method 2 using unstack
I created a dummy variable for this method in order to use unstack( ). The code is shorter but doesn't give me the result I want.
D = {'dm1', 'dm2', 'dm3', 'dm4', 'dm5', 'dm6';
'q1', 'q1', 'q2', 'q3', 'q3', 'q3';
'apple', 'appl', 'banana', 'orange', 'orang', 'orange'}';
T = cell2table(D, 'VariableNames', {'dummy', 'code', 'fruit'});
unstack(T, "fruit", "dummy")
Edited. Method 3 using crosstab. This method works nicely, but I wish I don't have to use a for-loop. The result from this method is exactly what I want.
[tb,~,~,lbs] = crosstab(T.code, T.fruit);
For-loop to create the intended table:
m = size(tb,1);
header = lbs(1:m,1);
fruits = lbs(:,2);
gT = table('Size',[6,4], 'VariableTypes', {'string','string','string','string'});
for i=1:m
tmp = fruits(tb(i,:)>0)';
l = size(tmp,2);
gT(i,"Var1") = header(i);
gT(i, 2:l+1) = tmp;
end
rmmissing(gT, "MinNumMissing",4)
Edited. After I post the above code, I thought about that Method 3 may be made leaner.
ix = find(tb>0);
[rows,cols]=ind2sub([3,4],ix);
% then for-loop through rows and cols to populate the final table.
% I still can't avoid for-loop.
2 个评论
the cyclist
2023-4-1
I'm confused about the organizing principle of the output you want. Is the following correct?
Also, your Solution 1 and Solution 3 are different, but you say they are both correct. That's confusing to me.
If this is all correct, I'd be curious what your downstream step is. Your data are currently stored in what is known as tidy format, and that is almost always better for analysis.
采纳的回答
Stephen23
2023-4-1
编辑:Stephen23
2023-4-27
Using UNSTACK is quite a neat solution because it will automatically pad different-length data to the same number of columns, adding "missing" values as required. This is otherwise fiddly to replicate. But to use UNSTACK, we need to add a variable that tells UNSTACK which columns to move the data into:
C = {'q1','q1','q2','q3','q3','q3';'apple','appl','banana','orange','orang','orange'}.';
T = cell2table(C, 'VariableNames', {'code','fruit'})
U = unique(T,'rows');
G = findgroups(U.code); % note1
F = @(n)(1:nnz(n==G)).'; % note1
U.count = cell2mat(arrayfun(F,unique(G),'uni',0)) % note1
U = unstack(U,"fruit","count", "VariableNamingRule","modify")
note1: This just gives a unique index to each element of a group. Astonishingly there does not seem to be an easy inbuilt way to achieve this... does anyone have any tips?: e.g. [1,1,1,2,2,1] -> [1,2,3,1,2,4] .
EDIT: I found a neater way:
G = findgroups(U.code);
U.count = grouptransform(ones(size(G)),G,@cumsum);
11 个评论
Stephen23
2023-4-27
编辑:Stephen23
2023-4-27
I thought of another approach based on GROUPTRANSFORM:
As mentioned in my answer, the desired transformation is [1,1,1,2,2,1] -> [1,2,3,1,2,4] .
G = [1;1,;1;2;2;1]; % must be column vector
Y = grouptransform(ones(size(G)),G,@cumsum)
Nice, it seems to work as we want. However in this case G luckily consists of integers 1..N. In all other cases we need to use e.g. FINDGROUPS first. Lets try it with the fake data that I used in my answer:
C = {'q1','q1','q2','q3','q3','q3';'apple','appl','banana','orange','orang','orange'}.';
T = cell2table(C, 'VariableNames', {'code','fruit'})
U = unique(T,'rows');
G = findgroups(U.code);
U.count = grouptransform(ones(size(G)),G,@cumsum)
U = unstack(U,"fruit","count", "VariableNamingRule","modify")
更多回答(1 个)
Peter Perkins
2023-4-5
I'm having trouble understanding the desireed output, but others have created what is essentially a crosstabulation of counts, so, new in R2023a
>> T = cell2table(C, 'VariableNames', {'code', 'fruit'});
>> pivot(T,Rows="code",Columns="fruit")
ans =
3×7 table
code appl apple banana orang orange oranges
______ ____ _____ ______ _____ ______ _______
{'q1'} 1 2 0 0 0 0
{'q2'} 0 0 1 0 0 0
{'q3'} 0 0 0 1 2 1
As cyclist points out, there are a bunch of empty bins, so the original "tidy" format may be more useful. To me, this looks like a case of "fruit ought to be categorical, and you ought to apply mergecats to clean up those typos/different spellings".
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 Data Distribution Plots 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!