Advantages of Using Categorical Arrays
Natural Representation of Categorical Data
categorical
is a data type to store data with values from a finite set of
discrete categories. One common alternative to using categorical arrays is to use string
arrays. However, while string arrays store text, you cannot use them to define categories.
The other common alternative to using categorical arrays is to store categorical data
using integers in numeric arrays. Using numeric arrays loses all the useful descriptive
information from the category names, and also tends to suggest that the integer values
have their usual numeric meaning, which, for categorical data, they do not.
Mathematical Ordering for Categories
Categorical arrays are convenient and memory efficient containers for nonnumeric data with
values from a finite set of discrete categories. They are especially useful when the
categories have a meaningful mathematical ordering, such as an array with entries from the
discrete set of categories ["small" "medium" "large"]
where
small < medium < large
.
The only ordering that string arrays provide is alphanumeric order. If you use a categorical array, then you can specify any ordering that makes sense for your set of categories. You can use relational operations to test for equality and perform elementwise comparisons that have a meaningful mathematical ordering.
Reduce Memory Requirements
This example shows how to compare the memory required to store data as a string array to the memory required for a categorical array. String arrays must store each element even when they have many repeated values. Categorical arrays store only one copy of each category name, often reducing the amount of memory required to store an array when it has many repeated values.
Create a sample string array.
state = [repmat("MA",25,1);repmat("NY",25,1); ... repmat("CA",50,1); ... repmat("MA",25,1);repmat("NY",25,1)];
Display information about the variable state
.
whos state
Name Size Bytes Class Attributes state 150x1 8212 string
Convert state
to a categorical array.
stateCats = categorical(state);
Display the discrete categories in the variable stateCats
.
categories(stateCats)
ans = 3x1 cell
{'CA'}
{'MA'}
{'NY'}
stateCats
contains 150 elements, but only three distinct categories.
Display information about the two variables. There is a significant reduction in the memory required to store the categorical array.
whos state stateCats
Name Size Bytes Class Attributes state 150x1 8212 string stateCats 150x1 524 categorical
See Also
Related Examples
- Create Categorical Arrays
- Convert Text in Table Variables to Categorical
- Compare Categorical Array Elements
- Access Data Using Categorical Arrays