Create Categorical Arrays
This example shows how to create categorical arrays from various types of input data and modify their elements. The categorical
data type stores values from a finite set of discrete categories. You can create a categorical array from a numeric array, logical array, string array, or cell array of character vectors. The unique values from the input array become the categories of the categorical array. A categorical array provides efficient storage and convenient manipulation of data while also maintaining meaningful names for the values.
By default, the categories of a categorical array do not have a mathematical ordering. For example, the discrete set of pet categories ["dog" "cat" "bird"]
has no meaningful mathematical ordering, so MATLAB® uses the alphabetical ordering ["bird" "cat" "dog"]
. But you can also create ordinal categorical arrays, in which the categories do have meaningful mathematical orderings. For example, the discrete set of size categories ["small" "medium" "large"]
can have the mathematical ordering of small < medium < large
. Ordinal categorical arrays enable you to make comparisons between their elements.
Create Categorical Array from Input Array
To create a categorical array from an input array, use the categorical
function.
For example, create a string array whose elements are all states from New England. Notice that some of the strings have leading and trailing spaces.
statesNE = ["MA" "ME" " CT" "VT" " ME " "NH" "VT" "MA" "NH" "CT" "RI "]
statesNE = 1x11 string
"MA" "ME" " CT" "VT" " ME " "NH" "VT" "MA" "NH" "CT" "RI "
Convert the string array to a categorical array. When you create categorical arrays from string arrays (or cell arrays of character vectors), leading and trailing spaces are removed.
statesNE = categorical(statesNE)
statesNE = 1x11 categorical
MA ME CT VT ME NH VT MA NH CT RI
List the categories of statesNE
by using the categories
function. Every element of statesNE
belongs to one of these categories. Because statesNE
has six unique states, there are six categories. The categories are listed in alphabetical order because the state abbreviations have no mathematical ordering.
categories(statesNE)
ans = 6x1 cell
{'CT'}
{'MA'}
{'ME'}
{'NH'}
{'RI'}
{'VT'}
Add and Modify Elements
To add one element to a categorical array, you can assign text that represents a category name. For example, add a state to statesNE
.
statesNE(12) = "ME"
statesNE = 1x12 categorical
MA ME CT VT ME NH VT MA NH CT RI ME
To add or modify multiple elements, you must assign a categorical array.
statesNE(1:3) = categorical(["RI" "VT" "MA"])
statesNE = 1x12 categorical
RI VT MA VT ME NH VT MA NH CT RI ME
Add Missing Values as Undefined Elements
You can assign missing values as undefined elements of a categorical array. An undefined categorical value does not belong to any category, similar to NaN
(Not-a-Number) in numeric arrays.
To assign missing values, use the missing
function. For example, modify the first element of the categorical array to be a missing value.
statesNE(1) = missing
statesNE = 1x12 categorical
<undefined> VT MA VT ME NH VT MA NH CT RI ME
Assign two missing values at the end of the categorical array.
statesNE(12:13) = [missing missing]
statesNE = 1x13 categorical
<undefined> VT MA VT ME NH VT MA NH CT RI <undefined> <undefined>
If you convert a string array to a categorical array, then missing strings and empty strings become undefined elements in the categorical array. If you convert a numeric array, then NaN
s become undefined elements. Therefore, assigning missing strings, ""
, ''
, or NaN
s to elements of a categorical array converts them to undefined categorical values.
statesNE(2) = ""
statesNE = 1x13 categorical
<undefined> <undefined> MA VT ME NH VT MA NH CT RI <undefined> <undefined>
Create Ordinal Categorical Array from String Array
In an ordinal categorical array, the order of the categories defines a mathematical order that enables comparisons. Because of this mathematical order, you can compare elements of an ordinal categorical array using relational operators. You cannot compare elements of categorical arrays that are not ordinal.
For example, create a string array that contains the sizes of eight objects.
AllSizes = ["medium" "large" "small" "small" "medium" ... "large" "medium" "small"];
The string array has three unique values: "large"
, "medium"
, and "small"
. A string array has no convenient way to indicate that small < medium < large
.
Convert the string array to an ordinal categorical array. Define the categories as small
, medium
, and large
, in that order. For an ordinal categorical array, the first category specified is the smallest and the last category is the largest.
valueset = ["small" "medium" "large"]; sizeOrd = categorical(AllSizes,valueset,"Ordinal",true)
sizeOrd = 1x8 categorical
medium large small small medium large medium small
The order of the values in the categorical array, sizeOrd
, remains unchanged.
List the discrete categories in sizeOrd
. The order of the categories matches their mathematical ordering small < medium < large
.
categories(sizeOrd)
ans = 3x1 cell
{'small' }
{'medium'}
{'large' }
Create Ordinal Categorical Array by Binning Numeric Data
If you have an array with continuous numeric data, specifying numeric ranges as categories can be useful. In such cases, bin the data using the discretize
function. Assign category names to the bins.
For example, create a vector of 100 random numbers between 0 and 50.
x = rand(100,1)*50
x = 100×1
40.7362
45.2896
6.3493
45.6688
31.6180
4.8770
13.9249
27.3441
47.8753
48.2444
⋮
Use discretize
to create a categorical array by binning the values of x
. Put all the values between 0 and 15 in the first bin, all the values between 15 and 35 in the second bin, and all the values between 35 and 50 in the third bin. Each bin includes the left endpoint but does not include the right endpoint, except the last bin.
catnames = ["small" "medium" "large"]; binnedData = discretize(x,[0 15 35 50],"categorical",catnames)
binnedData = 100x1 categorical
large
large
small
large
medium
small
small
medium
large
large
small
large
large
medium
large
small
medium
large
large
large
medium
small
large
large
medium
large
large
medium
medium
small
⋮
binnedData
is an ordinal categorical array with three categories, such that small < medium < large
.
To display the number of elements in each category, use the summary
function.
summary(binnedData)
binnedData: 100x1 ordinal categorical small 30 medium 35 large 35 <undefined> 0 Additional statistics: Min small Median medium Max large
You can make various kinds of charts of the binned data. For example, make a pie chart of binnedData
.
pie(binnedData)
Preallocate Categorical Array
You can preallocate a categorical array of any size by creating an array of NaN
s and converting it to a categorical array. After you preallocate the array, you can initialize its categories by adding the category names to the array.
For example, create a 2-by-4 array of NaN
s.
A = NaN(2,4)
A = 2×4
NaN NaN NaN NaN
NaN NaN NaN NaN
Then convert the array of NaN
s to a categorical array of undefined categorical values.
A = categorical(A)
A = 2x4 categorical
<undefined> <undefined> <undefined> <undefined>
<undefined> <undefined> <undefined> <undefined>
At this point, A
has no categories.
categories(A)
ans = 0x0 empty cell array
Add small
, medium
, and large
categories to A
is by using the addcats
function.
A = addcats(A,["small" "medium" "large"])
A = 2x4 categorical
<undefined> <undefined> <undefined> <undefined>
<undefined> <undefined> <undefined> <undefined>
While the elements of A
are still undefined values, the categories of A
are defined.
categories(A)
ans = 3x1 cell
{'small' }
{'medium'}
{'large' }
Now that A
has categories, you can assign defined categorical values as elements of A
.
A(1) = "medium"; A(8) = "small"; A(3:5) = "large"
A = 2x4 categorical
medium large large <undefined>
<undefined> large <undefined> small
See Also
categorical
| categories
| discretize
| summary
| addcats
| missing