Deleted table rows stuck in memory? Cannot fit a linear model.

3 次查看(过去 30 天)
I have a data table with both continuous and categorical values. I want to run a linear model for this table using 'fitlm'. I have a loop where I pick a different subset of rows and fit a model for it.
However, it appears that I cannot do slicing for categorical variables. Fitlm sees all possible categories of the full table and complains "Warning: Regression design matrix is rank deficient to within machine precision.". The non-existing categories also appear in the model. Even creating a temporary table from numerical matrix does not help!
Here is an example. I don't understand why one categorical factor (condition 3) won't go away.
% data with 3 categories
data = [...
1.9,1;
5.7,2;
0.7,1;
2.2,2;
0,1;
1.9,2;
-0.2,1;
1.6,2;
-0.7,1;
2.3,2;
1,3];
% create table
data_table = array2table(data,'VariableNames',{'Y','Condition'});
% make condition as categorical
data_table.Condition=categorical(data_table.Condition);
% fit linear model (basically a t-test)
model1 = fitlm(data_table,'Y ~ 1 + Condition');
% this works, but condition 3 is basically useless with only 1 sample
% Lets remove the final row and condition 3
data_table = data_table(1:end-1,:);
% repeat with sliced table (only 2 categories remains)
model2 = fitlm(data_table,'Y ~ 1 + Condition');
% We get a warning. Condition 3 is still there with no data.
% Create a new table from a numerical array
mat = table2cell(data_table);
new_data_table = cell2table(mat,'VariableNames',{'Y','Condition'});
new_data_table.Condition=categorical(new_data_table.Condition);
% no category 3 in the new table
model3 = fitlm(new_data_table,'Y ~ 1 + Condition');
% still the same warning even if there never was condition 3 in this table
% ok, lets clear old tables and start from the cell matrix
clear data_table new_data_table data;
new_new_data_table = cell2table(mat,'VariableNames',{'Y','Condition'});
new_new_data_table.Condition=categorical(new_new_data_table.Condition);
% again, no category 3 in the new table
model4 = fitlm(new_new_data_table,'Y ~ 1 + Condition');
% still the same warning, condition 3 remains
ADDITION:
In the latest version of Matlab I could probably use "removecats" to delete non-existing categories. However, this function is not available in r2017b.

采纳的回答

Hang Yu
Hang Yu 2018-6-28
Hi Janne,
Your approach to use removecats seems correct, however it is available in R2017b if you have the correct MATLAB installed. After you truncated the table, you can do
>> data_table.Condition = removecats(data_table.Condition);
and you don't have to convert the table anymore.
If you still can't find the removecats function for some reason, try to convert the categorical array in to double array by using
>> data_table.Condition = double(data_table.Condition);
followed by categorical again.
  3 个评论
Janne Kauttonen
Janne Kauttonen 2018-6-29
I'll accept this answer now, thank you. However, it would be great if someone could explain this table behavior more in-depth (why and how).
meng lei
meng lei 2021-3-3
The category is stored in the properties of the table and will be inherited from the previous table

请先登录,再进行评论。

更多回答(0 个)

产品


版本

R2017b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by