Main Content

Convert Text in Table Variables to Categorical

This example shows how to convert variables in a table from text to categorical arrays. The same workflow applies for table variables that are string arrays and variables that are cell arrays of character vectors.

Load Sample Data and Create a Table

Load sample data gathered from 100 patients.

load patients

Store the patient data from Age, Height, Weight, SelfAssessedHealthStatus, and Location in a table. Use the unique identifiers in the variable LastName as row names. To convert variables that are cell arrays of character vectors to string arrays, use the convertvars function.

T = table(Age,Height,Weight,Smoker,...
          SelfAssessedHealthStatus,Location,...
          'RowNames',LastName);
T = convertvars(T,@iscellstr,"string")
T=100×6 table
                Age    Height    Weight    Smoker    SelfAssessedHealthStatus             Location          
                ___    ______    ______    ______    ________________________    ___________________________

    Smith       38       71       176      true            "Excellent"           "County General Hospital"  
    Johnson     43       69       163      false           "Fair"                "VA Hospital"              
    Williams    38       64       131      false           "Good"                "St. Mary's Medical Center"
    Jones       40       67       133      false           "Fair"                "VA Hospital"              
    Brown       49       64       119      false           "Good"                "County General Hospital"  
    Davis       46       68       142      false           "Good"                "St. Mary's Medical Center"
    Miller      33       64       142      true            "Good"                "VA Hospital"              
    Wilson      40       68       180      false           "Good"                "VA Hospital"              
    Moore       28       68       183      false           "Excellent"           "St. Mary's Medical Center"
    Taylor      31       66       132      false           "Excellent"           "County General Hospital"  
    Anderson    45       68       128      false           "Excellent"           "County General Hospital"  
    Thomas      42       66       137      false           "Poor"                "St. Mary's Medical Center"
    Jackson     25       71       174      false           "Poor"                "VA Hospital"              
    White       39       72       202      true            "Excellent"           "VA Hospital"              
    Harris      36       65       129      false           "Good"                "St. Mary's Medical Center"
    Martin      48       71       181      true            "Good"                "VA Hospital"              
      ⋮

Convert Table Variables from Text to Categorical Arrays

The variables, Location and SelfAssessedHealthStatus, contain discrete sets of unique values. When a variable contains a set of values that can be thought of as categories, such as locations or statuses, consider converting it to a categorical variable.

Convert Location to a categorical array.

T.Location = categorical(T.Location);

The variable, SelfAssessedHealthStatus, contains four unique values: Excellent, Fair, Good, and Poor.

Convert SelfAssessedHealthStatus to an ordinal categorical array, such that the categories have the mathematical ordering Poor < Fair < Good < Excellent.

T.SelfAssessedHealthStatus = categorical(T.SelfAssessedHealthStatus,...
    {'Poor','Fair','Good','Excellent'},'Ordinal',true);

Print a Summary

View the data type, description, units, and other descriptive statistics for each variable by using summary to summarize the table.

format compact

summary(T)
T: 100x6 table
Variables:
    Age: double
    Height: double
    Weight: double
    Smoker: logical (34 true)
    SelfAssessedHealthStatus: ordinal categorical (4 categories)
    Location: categorical (3 categories)
Statistics for applicable variables:
                                NumMissing      Min          Median            Max            Mean            Std    
    Age                             0             25              39               50        38.2800         7.2154  
    Height                          0             60              67               72        67.0700         2.8365  
    Weight                          0            111        142.5000              202            154        26.5714  
    SelfAssessedHealthStatus        0           Poor        Good            Excellent                                
    Location                        0                                                                                

The table variables SelfAssessedHealthStatus and Location are categorical arrays. The summary contains the counts of the number of elements in each category. For example, the summary indicates that 11 of the 100 patients assess their own health as poor and 34 assess their health as excellent.

Select Data Based on Categories

Create a subtable, T1, containing the age, height, and weight of all patients who were observed at County General Hospital and assesses their own health as excellent. You can easily create a logical vector based on the values in the categorical arrays Location and SelfAssessedHealthStatus.

rows = T.Location=='County General Hospital' & T.SelfAssessedHealthStatus=='Excellent';

rows is a 100-by-1 logical vector with logical true (1) for the table rows where the location is County General Hospital and the patients assessed their health as excellent.

Define the subset of variables.

vars = ["Age","Height","Weight"];

Use parentheses to create the subtable, T1.

T1 = T(rows,vars)
T1=13×3 table
                  Age    Height    Weight
                  ___    ______    ______
    Smith         38       71       176  
    Taylor        31       66       132  
    Anderson      45       68       128  
    King          30       67       186  
    Edwards       42       70       158  
    Rivera        29       63       130  
    Richardson    30       67       141  
    Torres        45       70       137  
    Peterson      32       60       136  
    Ramirez       48       64       137  
    Barnes        42       66       194  
    Butler        38       68       184  
    Bryant        48       66       134  

Since ordinal categorical arrays have a mathematical ordering for their categories, you can perform elementwise comparisons of them with relational operations, such as greater than and less than.

Create a subtable, T2, of the age, height, and weight of all patients who assessed their health status as poor or fair.

First, define the subset of rows to include in table T2.

rows = T.SelfAssessedHealthStatus<='Fair';

Then, define the subset of variables to include in table T2.

vars = ["Age","Height","Weight"];

Use parentheses to create the subtable T2.

T2 = T(rows,vars)
T2=26×3 table
                 Age    Height    Weight
                 ___    ______    ______
    Johnson      43       69       163  
    Jones        40       67       133  
    Thomas       42       66       137  
    Jackson      25       71       174  
    Garcia       27       69       131  
    Rodriguez    39       64       117  
    Lewis        41       62       137  
    Lee          44       66       146  
    Hall         25       70       189  
    Hernandez    36       68       166  
    Lopez        40       66       137  
    Gonzalez     35       66       118  
    Mitchell     39       71       164  
    Campbell     37       65       135  
    Parker       30       68       182  
    Stewart      49       68       170  
      ⋮

Related Examples

More About