How can I create new variables based on groups?
2 次查看(过去 30 天)
显示 更早的评论
Hello everyone,
I want to create new variables in order to perform a t-test based on the group membership of my subjects. I have this code here:
clearvars
close all
filepath = ['filepath'];
T =readtable('filename');
G = findgroups(T(:,1))
if G == 1
X = T(:,:)
else G == 2
Y = T(:,:)
end
I am encountering the following problem: It does not work. I will only get table T again for Y and not what I want, two entirely seperate tables based on whether a subject is in group 1 or 2. Any help or tips would be appreciated.
Thank you
18 个评论
Rik
2020-4-27
If you set a breakpoint you will see what is happening: only one of the branches will be executed.
It is a common mistake that people make: if you use an array as the conditional in an if-statement, it may not do what you expect. Either use a loop or an array operation.
If you want specific help: share your data or write code that will generate plausible data.
Hannah_Mad
2020-4-27
Thank you Rik,
Please see below an excerpt from my data.
1 '0,1188' '0,1103' '1,4' '1,3' '-13,00950292' '-1,000894239' '3,728322672' '12,81289888' '0,468820547' '1,169608552'
1 '0,1103' '0,2376' '1,3' '2,8' '-11,8' '-2' '3,6' '13,4' '-0,9' '2,9'
1 '0,1313' '0,1717' '1,3' '1,7' '-13,28540783' '-3,043789654' '1,401630356' '13,32603837' '-2,987182197' '0,545827005'
1 '0,0971' '0,0883' '1,1' '1' '-15,71450602' '-3,962745391' '3,050642807' '13,45261762' '-1,497263892' '3,083489585'
2 '0,295' '0,295' '2,8' '2,8' '-14,5881751' '-2,603528618' '3,518819139' '14,33740562' '-1,870682366' '3,525744346'
2 '0,0883' '0,0883' '1' '1' '-12,86394769' '-5,766465114' '3,120227299' '13,97601291' '-4,209455419' '3,276772679'
2 '0,2191' '0,402' '2' '3,3' '' '' '' '' '' ''
2 '0,1424' '0,1442' '1,6' '1,5' '-17,17220026' '2,691067249' '6,865599728' '14,59057189' '4,206039042' '5,34181054'
2 '' '' '' '' '-13,1' '-4,9' '1,5' '12,7' '-2,7' '3,1'
If I try and use a loop:
clearvars
close all
filepath = ['C:\Hannah\ET\Statistik'];
T =readtable('ET2mat.csv');
G = findgroups(T(:,1))
for k = 1:44
if T(:,1) == 1
x = T(:,:)
else T(:,1) == 2
y = T(:,:)
end
end
I will get the following error message: Undefined operator '==' for input arguments of type 'table'.
So what do I need to do? Make it an array? I understand that maybe this will not work because the grouping variable is not a vector but part of the table.
Thank you
Stephen23
2020-4-27
Using if is a red herring and rather unsuitable. The MATLAB way would be to use logical indexing, e.g.:
G = findgroups(T(:,1))
X = T(G==1,:);
Y = T(G==2,:);
But note that splitting up your table into separate variables is unlikely to be required, nor a good approach. The recommended approach is to use the Split-Apply-Combine Workflow on one table:
Hannah_Mad
2020-4-27
I use splitapply for most things, such as mean, standard deviation etc., however, it does not work for the t-test - do you have another suggestion for this perhaps? Thank you.
Hannah_Mad
2020-4-27
So this is my code then:
G = findgroups(T(:,1))
splitapply(ttest,(T(:,2)), G)
Whiich will result in this error message:
Not enough input arguments.
Error in ttest (line 124)
dim = find(size(x) ~= 1, 1);
Error in test (line 7)
splitapply(ttest,(T(:,2)), G)
>>
Stephen23
2020-4-27
You called ttest with no input arguments, thus the error. You forgot to use @ to create a function handle:
splitapply(@ttest,...)
% ^ you forgot this
Hannah_Mad
2020-4-27
Thank you very much!
However, I still get the following error:
Error using splitapply (line 132)
Applying the function 'ttest' to the 1st group of data generated the following error:
Undefined function 'isnan' for input arguments of type 'cell'.
Error in test (line 7)
splitapply(@ttest,(T(:,11)), G)
Stephen23
2020-4-27
Hannah_Mad's "Answer" moved here:
Well. I keep getting error messages, different ones though.
clearvars
close all
filepath = ['C:\Hannah\ET\Statistik'];
T = readtable('ET2mat.csv');
F = rmmissing(T)
[row col] = size(F)
G = findgroups(F(:,1))
for n = 2:col
fprintf('This is column %d. \n' , n)
splitapply(@ttest,F,G)
end
Will result in:
Error using splitapply (line 132)
Applying the function 'ttest' to the 1st group of data generated the following error:
Undefined function 'minus' for input arguments of type 'cell'.
Error in test (line 12)
splitapply(@ttest,F,G)
So what can I do from here - I do in fact have negative values in my table. Is that the reason?
Stephen23
2020-4-27
"I do in fact have negative values in my table. Is that the reason?"
The actual reason is your data file, which is imported as character, not as numeric. The reasons are:
- The file is typical of regions which use a decimal comma, namely tab-separated values (and a misleading .CSV file extension). Whilst readtable can cope with the tab delimiter, it cannot parse decimal commas.
- single quotes around all "numeric" values. I cannot image what badly written application does that.
Because of these, readtable imports that data (which you think is numeric) as character vectors in cell vectors, complete with single quotes. You can check this quite easily (because you did not upload a sample file I had to create it myself based on your earlier comment, attached, including column headers):
>> T = readtable('test.txt','delimiter','\t')
T =
AA BB CC DD EE FF GG HH II JJ KK
__ __________ __________ _______ _______ ________________ ________________ _______________ _______________ ________________ _______________
1 ''0,1188'' ''0,1103'' ''1,4'' ''1,3'' ''-13,00950292'' ''-1,000894239'' ''3,728322672'' ''12,81289888'' ''0,468820547'' ''1,169608552''
1 ''0,1103'' ''0,2376'' ''1,3'' ''2,8'' ''-11,8'' ''-2'' ''3,6'' ''13,4'' ''-0,9'' ''2,9''
1 ''0,1313'' ''0,1717'' ''1,3'' ''1,7'' ''-13,28540783'' ''-3,043789654'' ''1,401630356'' ''13,32603837'' ''-2,987182197'' ''0,545827005''
1 ''0,0971'' ''0,0883'' ''1,1'' ''1'' ''-15,71450602'' ''-3,962745391'' ''3,050642807'' ''13,45261762'' ''-1,497263892'' ''3,083489585''
2 ''0,295'' ''0,295'' ''2,8'' ''2,8'' ''-14,5881751'' ''-2,603528618'' ''3,518819139'' ''14,33740562'' ''-1,870682366'' ''3,525744346''
2 ''0,0883'' ''0,0883'' ''1'' ''1'' ''-12,86394769'' ''-5,766465114'' ''3,120227299'' ''13,97601291'' ''-4,209455419'' ''3,276772679''
2 ''0,2191'' ''0,402'' ''2'' ''3,3'' '''' '''' '''' '''' '''' ''''
2 ''0,1424'' ''0,1442'' ''1,6'' ''1,5'' ''-17,17220026'' ''2,691067249'' ''6,865599728'' ''14,59057189'' ''4,206039042'' ''5,34181054''
2 '''' '''' '''' '''' ''-13,1'' ''-4,9'' ''1,5'' ''12,7'' ''-2,7'' ''3,1''
>> cellfun(@class,T.BB,'uni',0)
ans =
'char'
'char'
'char'
'char'
'char'
'char'
'char'
'char'
'char'
>> +T.BB{1} % first and last characters are single-quotes.
ans =
39 48 44 49 49 56 56 39
Essentially you have two choices:
- write or edit the file so that all numeric data are written without single quotes and using decimal points, then efficiently import the whole file in one step using readtable, or
- parse those character vectors inside of MATLAB, replacing the decimal commas with decimal points and then converting to numeric. Not particularly efficient, but it can work with your existing data files, e.g.:
T.KKnum = str2double(strrep(strrep(T.KK,'''',''),',','.'));
You can then apply numeric functions to that numeric data. I recommend that you use the variable names to refer to the data columns, rather than indexing.
Hannah_Mad
2020-4-28
So, unfortunately it is still not working. Dataset will be provided.
This is my code:
clearvars
close all
filepath = ['C:\Hannah\ET\Statistik'];
N = readtable('ET2mat.txt','delimiter','\t')
NNUM = str2double(strrep(strrep(N.Gruppe, N.CV_left, N.CV_right, N.Amplitude_left, N.Amplitude_right, N.x_left, N.y_left, N.z_left, N.x_right, N.y_right, N.z_right),',','.'));
F = rmmissing(NNUM)
[row col] = size(F)
N = F(:,1:col)
G = findgroups(N(:,1))
splitapply(@ttest,N,G)
for n = 2:col
fprintf('This is column %d. \n' , n)
splitapply(@ttest,F,G)
end
The error will always be
Error using strrep
Too many input arguments.
Any ideas on that?
Also: how can I chose any of your answers and rate them? I heard I am supposed to do that but it won't work here.
Thank you for your help. I know I am a beginner to MATLAB but it is quite tedious.
Hannah_Mad
2020-4-28
I now did change the commas in excel to dots - so far everything seems fine but I seem to be getting a different error message.
This is my code now:
clearvars
close all
filepath = ['C:\Hannah\ET\Statistik'];
N = readtable('ET2mat.txt','delimiter','\t')
F = rmmissing(N)
[row col] = size(F)
G = findgroups(F{:,1})
for n = 2:col
fprintf('This is column %d. \n' , n)
[h, p ] = splitapply(@ttest,F,G,'Alpha',0.05 )
end
The error I get is:
Error using splitapply (line 87)
Group numbers must be a vector of positive integers, and cannot be a sparse vector.
Error in test (line 14)
[h, p ] = splitapply(@ttest,F,G,'Alpha',0.05 )
Tommy
2020-4-28
编辑:Tommy
2020-4-28
Have you also gotten rid of the quotes in your text file?
This line:
[h, p ] = splitapply(@ttest,F,G)
would pass every column within F to ttest at once, as separate arguments. If you want to consider each column individually, you could use
for n = 2:col
fprintf('This is column %d. \n' , n)
[h, p ] = splitapply(@ttest,F{:,n},G)
end
(although this does use indexing rather than variable names.)
Then, the last argument to splitapply must be G, so you cannot have
[h, p ] = splitapply(@ttest,F{:,n},G,'Alpha',0.05 )
because of the 'Alpha' and 0.05. splitapply thinks the 0.05 specifies the group numbers, which is not allowed because the group numbers need to be positive integers. If you want, you could use this syntax:
[h, p ] = splitapply(@(x,y) ttest(x,y,'Alpha',0.05),F{:,n},?,G)
or this syntax:
[h, p ] = splitapply(@(x,m) ttest(x,m,'Alpha',0.05),F{:,n},?,G)
both of which are explained in the documentation for ttest, but this would require you to pass a y or m to ttest, perhaps in place of the ?s above. However, the default alpha value is 0.05, so you shouldn't need to provide it anyway.
(edit) You can only choose and vote for answers, but so far everything here is a comment.
Walter Roberson
2020-4-28
Group numbers must be a vector of positive integers, and cannot be a sparse vector.
You could get that if your G is empty. Check whether F is empty.
Hannah_Mad
2020-4-28
Thank you very much for your kind explanations and detailed information. However I am not entirely sure that this script does what I believe it does: compare the means of two groups (1 and 2, hence the splitapply approach) - as I get two h-values for each column. Shouldn't it be only one value? As there are two groups being compared per column. Do you have any idea about that?
Again, I can only apologize for my basic questions.
Thank you!
Hannah_Mad
2020-4-29
Hello Walter,
I got the following:
class(F{:,1}) : double
size(F{:,1}) 38 1
size(G) 38 1
I think that is alright, isn't it?
Thank you,
Hannah
回答(0 个)
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 Debugging and Analysis 的更多信息
标签
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!发生错误
由于页面发生更改,无法完成操作。请重新加载页面以查看其更新后的状态。
您也可以从以下列表中选择网站:
如何获得最佳网站性能
选择中国网站(中文或英文)以获得最佳网站性能。其他 MathWorks 国家/地区网站并未针对您所在位置的访问进行优化。
美洲
- América Latina (Español)
- Canada (English)
- United States (English)
欧洲
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)
亚太
- Australia (English)
- India (English)
- New Zealand (English)
- 中国
- 日本Japanese (日本語)
- 한국Korean (한국어)