How to group data within a column by specific text within that column
14 次查看(过去 30 天)
显示 更早的评论
I have a dataset of about 260,000 data points. One of the columns, "species_name'' has various species names within the column. How can I group this data by specific species names (and therefore, group the data in the other columns within the dataset (size, for example) by specific species names)?
2 个评论
Adam Danz
2021-2-7
Are you just trying to index the table?
load fisheriris
T = table(categorical(species), meas(:,1),meas(:,2),meas(:,3),meas(:,4));
T.Properties.VariableNames{1} = 'Species'
T(T.Species=='virginica',:)
回答(2 个)
dpb
2021-2-6
A sample dataset always helps, but probably be good to convert species to a categorical variable first (although not mandatory).
Then using grouping variables -- see
doc findgroups
doc splitapply
if keeping data in an array or look at
doc rowfun
for table, timetable.
2 个评论
dpb
2021-2-7
Well, w/o something to work with, it's harder to guess...attach the table or .mat file with the data, or a short text listing of enough to illustrate.
Then, give us a precise definition of the problem to be solved.
Also, show us what you have tried and where you had a problem.
As I've pointed out in several related Q? recently, rarely do you really need to actually separate out the data into separate arrays; instead of duplicating data already have, use grouping variables and process as wanted.
dpb
2021-2-7
Illustration with faked data...
tmp=categorical({'star','bat','crab'}); % the categorical variable categories
t=table(tmp(randi(3,[20,1])).',randn(20,1),'VariableNames',{'Species','Size'}); % make up some data
>> head(t) % show what first little bit looks like...
ans =
8×2 table
Species Size
_______ ________
bat -0.65863
crab -1.2834
crab 0.23872
bat 1.5475
star 0.1869
star -1.8809
crab 0.40569
bat 0.64618
>> summary(t) % summary statistics on the table
Variables:
Species: 20×1 categorical
Values:
bat 6
crab 9
star 5
Size: 20×1 double
Values:
Min -1.8809
Median 0.21281
Max 1.5967
>> rowfun(@mean,t,'GroupingVariables','Species', ...
'InputVariables','Size','OutputVariableNames','GroupMean') % group means
ans =
3×3 table
Species GroupCount GroupMean
_______ __________ _________
bat 6 0.42427
crab 9 0.10477
star 5 -0.46693
>>
Can do whatever wanted...
0 个评论
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 Text Data Preparation 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!