Convert Text in Table Variables to Categorical
This example shows how to convert variables in a table from text to categorical
arrays. The same workflow applies for table variables that are string arrays and variables that are cell arrays of character vectors.
Load Sample Data and Create a Table
Load sample data gathered from 100 patients.
load patients
Store the patient data from Age
, Height
, Weight
, SelfAssessedHealthStatus
, and Location
in a table. Use the unique identifiers in the variable LastName
as row names. To convert variables that are cell arrays of character vectors to string arrays, use the convertvars
function.
T = table(Age,Height,Weight,Smoker,... SelfAssessedHealthStatus,Location,... 'RowNames',LastName); T = convertvars(T,@iscellstr,"string")
T=100×6 table
Age Height Weight Smoker SelfAssessedHealthStatus Location
___ ______ ______ ______ ________________________ ___________________________
Smith 38 71 176 true "Excellent" "County General Hospital"
Johnson 43 69 163 false "Fair" "VA Hospital"
Williams 38 64 131 false "Good" "St. Mary's Medical Center"
Jones 40 67 133 false "Fair" "VA Hospital"
Brown 49 64 119 false "Good" "County General Hospital"
Davis 46 68 142 false "Good" "St. Mary's Medical Center"
Miller 33 64 142 true "Good" "VA Hospital"
Wilson 40 68 180 false "Good" "VA Hospital"
Moore 28 68 183 false "Excellent" "St. Mary's Medical Center"
Taylor 31 66 132 false "Excellent" "County General Hospital"
Anderson 45 68 128 false "Excellent" "County General Hospital"
Thomas 42 66 137 false "Poor" "St. Mary's Medical Center"
Jackson 25 71 174 false "Poor" "VA Hospital"
White 39 72 202 true "Excellent" "VA Hospital"
Harris 36 65 129 false "Good" "St. Mary's Medical Center"
Martin 48 71 181 true "Good" "VA Hospital"
⋮
Convert Table Variables from Text to Categorical Arrays
The variables, Location
and SelfAssessedHealthStatus
, contain discrete sets of unique values. When a variable contains a set of values that can be thought of as categories, such as locations or statuses, consider converting it to a categorical
variable.
Convert Location
to a categorical
array.
T.Location = categorical(T.Location);
The variable, SelfAssessedHealthStatus
, contains four unique values: Excellent
, Fair
, Good
, and Poor
.
Convert SelfAssessedHealthStatus
to an ordinal categorical
array, such that the categories have the mathematical ordering Poor < Fair < Good < Excellent
.
T.SelfAssessedHealthStatus = categorical(T.SelfAssessedHealthStatus,... {'Poor','Fair','Good','Excellent'},'Ordinal',true);
Print a Summary
View the data type, description, units, and other descriptive statistics for each variable by using summary
to summarize the table.
format compact
summary(T)
T: 100x6 table Variables: Age: double Height: double Weight: double Smoker: logical (34 true) SelfAssessedHealthStatus: ordinal categorical (4 categories) Location: categorical (3 categories) Statistics for applicable variables: NumMissing Min Median Max Mean Std Age 0 25 39 50 38.2800 7.2154 Height 0 60 67 72 67.0700 2.8365 Weight 0 111 142.5000 202 154 26.5714 SelfAssessedHealthStatus 0 Poor Good Excellent Location 0
The table variables SelfAssessedHealthStatus
and Location
are categorical
arrays. The summary contains the counts of the number of elements in each category. For example, the summary indicates that 11 of the 100 patients assess their own health as poor and 34 assess their health as excellent.
Select Data Based on Categories
Create a subtable, T1
, containing the age, height, and weight of all patients who were observed at County General Hospital and assesses their own health as excellent. You can easily create a logical vector based on the values in the categorical
arrays Location
and SelfAssessedHealthStatus
.
rows = T.Location=='County General Hospital' & T.SelfAssessedHealthStatus=='Excellent';
rows
is a 100-by-1 logical vector with logical true
(1
) for the table rows where the location is County General Hospital and the patients assessed their health as excellent.
Define the subset of variables.
vars = ["Age","Height","Weight"];
Use parentheses to create the subtable, T1
.
T1 = T(rows,vars)
T1=13×3 table
Age Height Weight
___ ______ ______
Smith 38 71 176
Taylor 31 66 132
Anderson 45 68 128
King 30 67 186
Edwards 42 70 158
Rivera 29 63 130
Richardson 30 67 141
Torres 45 70 137
Peterson 32 60 136
Ramirez 48 64 137
Barnes 42 66 194
Butler 38 68 184
Bryant 48 66 134
Since ordinal categorical
arrays have a mathematical ordering for their categories, you can perform elementwise comparisons of them with relational operations, such as greater than and less than.
Create a subtable, T2
, of the age, height, and weight of all patients who assessed their health status as poor or fair.
First, define the subset of rows to include in table T2
.
rows = T.SelfAssessedHealthStatus<='Fair';
Then, define the subset of variables to include in table T2
.
vars = ["Age","Height","Weight"];
Use parentheses to create the subtable T2
.
T2 = T(rows,vars)
T2=26×3 table
Age Height Weight
___ ______ ______
Johnson 43 69 163
Jones 40 67 133
Thomas 42 66 137
Jackson 25 71 174
Garcia 27 69 131
Rodriguez 39 64 117
Lewis 41 62 137
Lee 44 66 146
Hall 25 70 189
Hernandez 36 68 166
Lopez 40 66 137
Gonzalez 35 66 118
Mitchell 39 71 164
Campbell 37 65 135
Parker 30 68 182
Stewart 49 68 170
⋮
Related Examples
- Create Tables and Assign Data to Them
- Create Categorical Arrays
- Access Data in Tables
- Access Data Using Categorical Arrays