How to convert categorical data to numeric in separate columns?

15 次查看(过去 30 天)
% Hi! I have a dataset 'data5' with a column 'Location' which contains values Asia, US and Africa.
% I'm wanting to convert it to 3 separate columns, one for each location, which contains a 1 if the row is from that location and 0 otherwise
% This is the function I have created:
function data = categorical_values(data, var)
uniques = unique(var);
for i = 1:length(uniques)
values(:, i) = double(ismember(var, uniques(i)));
end
t = table;
[rows, cols] = size(values);
for i = 1:cols
t1 = table(values(:, i));
t1.Properties.VariableNames = uniques(i);
t = [t t1];
end
data = [t data];
end
% And this is the code I have been running, in a file called prep.m:
new = categorical_values(data5, data5.Location);
new.Location = []; % delete the old Location column
% I have been getting this error:
Error using categorical_values (line 11)
The VariableNames property is a cell array of character vectors. To
assign multiple variable names, specify names in a string array or a cell
array of character vectors.
Error in prep (line 16)
new = categorical_values(data5, data5.Location);
% Can anyone help??????? Thanks!

回答(1 个)

Adam Danz
Adam Danz 2020-8-10
编辑:Adam Danz 2020-10-26
Here's a more efficient solution.
% Create demo data
location = categorical({'Asia','US','Asia','Africa','Africa','US','US','Asia'}');
unqCountries = unique(location(:)')
unqCountries = 1×3 categorical array
Africa Asia US
% Create matrix of 1s % 0s.
% Columns are identified by "unqCountries"
countryIdx = location(:) == unqCountries
countryIdx = 8x3 logical array
0 1 0 0 0 1 0 1 0 1 0 0 1 0 0 0 0 1 0 0 1 0 1 0
% If you want to turn it into a table
T = array2table(countryIdx, 'VariableNames', string(unqCountries))
T = 8x3 table
Africa Asia US ______ _____ _____ false true false false false true false true false true false false true false false false false true false false true false true false
The error you're getting is because you're assigning a categorical variable as a table variable name which must be a character array or string. Convert to string:
t1.Properties.VariableNames = string(unique(i));
  4 个评论
Adam Danz
Adam Danz 2020-10-26
"Is this same as dummy coding or One Hot Encoding?"
The T table could be used as dummy variables and contains binary values (true|false) which is similar to using dummy variables in regression.

请先登录,再进行评论。

类别

Help CenterFile Exchange 中查找有关 Data Type Conversion 的更多信息

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by