Create Categorical Arrays

Open Live Script

This example shows how to create categorical arrays from various types of input data and modify their elements. The categorical data type stores values from a finite set of discrete categories. You can create a categorical array from a numeric array, logical array, string array, or cell array of character vectors. The unique values from the input array become the categories of the categorical array. A categorical array provides efficient storage and convenient manipulation of data while also maintaining meaningful names for the values.

By default, the categories of a categorical array do not have a mathematical ordering. For example, the discrete set of pet categories ["dog" "cat" "bird"] has no meaningful mathematical ordering, so MATLAB® uses the alphabetical ordering ["bird" "cat" "dog"]. But you can also create ordinal categorical arrays, in which the categories do have meaningful mathematical orderings. For example, the discrete set of size categories ["small" "medium" "large"] can have the mathematical ordering of small < medium < large. Ordinal categorical arrays enable you to make comparisons between their elements.

Create Categorical Array from Input Array

To create a categorical array from an input array, use the categorical function.

For example, create a string array whose elements are all states from New England. Notice that some of the strings have leading and trailing spaces.

statesNE = ["MA" "ME" " CT" "VT" " ME " "NH" "VT" "MA" "NH" "CT" "RI "]

statesNE = 1x11 string
    "MA"    "ME"    " CT"    "VT"    " ME "    "NH"    "VT"    "MA"    "NH"    "CT"    "RI "

Convert the string array to a categorical array. When you create categorical arrays from string arrays (or cell arrays of character vectors), leading and trailing spaces are removed.

statesNE = categorical(statesNE)

statesNE = 1x11 categorical
     MA      ME      CT      VT      ME      NH      VT      MA      NH      CT      RI

List the categories of statesNE by using the categories function. Every element of statesNE belongs to one of these categories. Because statesNE has six unique states, there are six categories. The categories are listed in alphabetical order because the state abbreviations have no mathematical ordering.

categories(statesNE)

ans = 6x1 cell
    {'CT'}
    {'MA'}
    {'ME'}
    {'NH'}
    {'RI'}
    {'VT'}

Add and Modify Elements

To add one element to a categorical array, you can assign text that represents a category name. For example, add a state to statesNE.

statesNE(12) = "ME"

statesNE = 1x12 categorical
     MA      ME      CT      VT      ME      NH      VT      MA      NH      CT      RI      ME

To add or modify multiple elements, you must assign a categorical array.

statesNE(1:3) = categorical(["RI" "VT" "MA"])

statesNE = 1x12 categorical
     RI      VT      MA      VT      ME      NH      VT      MA      NH      CT      RI      ME

Add Missing Values as Undefined Elements

You can assign missing values as undefined elements of a categorical array. An undefined categorical value does not belong to any category, similar to NaN (Not-a-Number) in numeric arrays.

To assign missing values, use the missing function. For example, modify the first element of the categorical array to be a missing value.

statesNE(1) = missing

statesNE = 1x12 categorical
     <undefined>      VT      MA      VT      ME      NH      VT      MA      NH      CT      RI      ME

Assign two missing values at the end of the categorical array.

statesNE(12:13) = [missing missing]

statesNE = 1x13 categorical
     <undefined>      VT      MA      VT      ME      NH      VT      MA      NH      CT      RI      <undefined>      <undefined>

If you convert a string array to a categorical array, then missing strings and empty strings become undefined elements in the categorical array. If you convert a numeric array, then NaNs become undefined elements. Therefore, assigning missing strings, "", '', or NaNs to elements of a categorical array converts them to undefined categorical values.

statesNE(2) = ""

statesNE = 1x13 categorical
     <undefined>      <undefined>      MA      VT      ME      NH      VT      MA      NH      CT      RI      <undefined>      <undefined>

Create Ordinal Categorical Array from String Array

In an ordinal categorical array, the order of the categories defines a mathematical order that enables comparisons. Because of this mathematical order, you can compare elements of an ordinal categorical array using relational operators. You cannot compare elements of categorical arrays that are not ordinal.

For example, create a string array that contains the sizes of eight objects.

AllSizes = ["medium" "large" "small" "small" "medium"  ...
            "large" "medium" "small"];

The string array has three unique values: "large", "medium", and "small". A string array has no convenient way to indicate that small < medium < large.

Convert the string array to an ordinal categorical array. Define the categories as small, medium, and large, in that order. For an ordinal categorical array, the first category specified is the smallest and the last category is the largest.

valueset = ["small" "medium" "large"];
sizeOrd = categorical(AllSizes,valueset,"Ordinal",true)

sizeOrd = 1x8 categorical
     medium      large      small      small      medium      large      medium      small

The order of the values in the categorical array, sizeOrd, remains unchanged.

List the discrete categories in sizeOrd. The order of the categories matches their mathematical ordering small < medium < large.

categories(sizeOrd)

ans = 3x1 cell
    {'small' }
    {'medium'}
    {'large' }

Create Ordinal Categorical Array by Binning Numeric Data

If you have an array with continuous numeric data, specifying numeric ranges as categories can be useful. In such cases, bin the data using the discretize function. Assign category names to the bins.

For example, create a vector of 100 random numbers between 0 and 50.

x = rand(100,1)*50

Use discretize to create a categorical array by binning the values of x. Put all the values between 0 and 15 in the first bin, all the values between 15 and 35 in the second bin, and all the values between 35 and 50 in the third bin. Each bin includes the left endpoint but does not include the right endpoint, except the last bin.

catnames = ["small" "medium" "large"];
binnedData = discretize(x,[0 15 35 50],"categorical",catnames)

binnedData = 100x1 categorical
     large 
     large 
     small 
     large 
     medium 
     small 
     small 
     medium 
     large 
     large 
     small 
     large 
     large 
     medium 
     large 
     small 
     medium 
     large 
     large 
     large 
     medium 
     small 
     large 
     large 
     medium 
     large 
     large 
     medium 
     medium 
     small 
      ⋮

binnedData is an ordinal categorical array with three categories, such that small < medium < large.

To display the number of elements in each category, use the summary function.

summary(binnedData)

binnedData: 100x1 ordinal categorical

     small            30 
     medium           35 
     large            35 
     <undefined>       0 

Additional statistics:

    Min         small   
    Median      medium  
    Max         large

You can make various kinds of charts of the binned data. For example, make a pie chart of binnedData.

pie(binnedData)

Figure contains an axes object. The hidden axes object contains 6 objects of type patch, text. These objects represent small, medium, large.

Preallocate Categorical Array

You can preallocate a categorical array of any size by creating an array of NaNs and converting it to a categorical array. After you preallocate the array, you can initialize its categories by adding the category names to the array.

For example, create a 2-by-4 array of NaNs.

A = NaN(2,4)

A = 2×4

   NaN   NaN   NaN   NaN
   NaN   NaN   NaN   NaN

Then convert the array of NaNs to a categorical array of undefined categorical values.

A = categorical(A)

A = 2x4 categorical
     <undefined>      <undefined>      <undefined>      <undefined> 
     <undefined>      <undefined>      <undefined>      <undefined>

At this point, A has no categories.

categories(A)

ans =

  0x0 empty cell array

Add small, medium, and large categories to A is by using the addcats function.

A = addcats(A,["small" "medium" "large"])

A = 2x4 categorical
     <undefined>      <undefined>      <undefined>      <undefined> 
     <undefined>      <undefined>      <undefined>      <undefined>

While the elements of A are still undefined values, the categories of A are defined.

categories(A)

ans = 3x1 cell
    {'small' }
    {'medium'}
    {'large' }

Now that A has categories, you can assign defined categorical values as elements of A.

A(1) = "medium";
A(8) = "small";
A(3:5) = "large"

A = 2x4 categorical
     medium           large      large            <undefined> 
     <undefined>      large      <undefined>      small