Main Content

Advantages of Using Categorical Arrays

Natural Representation of Categorical Data

categorical is a data type to store data with values from a finite set of discrete categories. One common alternative to using categorical arrays is to use string arrays. However, while string arrays store text, you cannot use them to define categories. The other common alternative to using categorical arrays is to store categorical data using integers in numeric arrays. Using numeric arrays loses all the useful descriptive information from the category names, and also tends to suggest that the integer values have their usual numeric meaning, which, for categorical data, they do not.

Mathematical Ordering for Categories

Categorical arrays are convenient and memory efficient containers for nonnumeric data with values from a finite set of discrete categories. They are especially useful when the categories have a meaningful mathematical ordering, such as an array with entries from the discrete set of categories ["small" "medium" "large"] where small < medium < large.

The only ordering that string arrays provide is alphanumeric order. If you use a categorical array, then you can specify any ordering that makes sense for your set of categories. You can use relational operations to test for equality and perform elementwise comparisons that have a meaningful mathematical ordering.

Reduce Memory Requirements

This example shows how to compare the memory required to store data as a string array to the memory required for a categorical array. String arrays must store each element even when they have many repeated values. Categorical arrays store only one copy of each category name, often reducing the amount of memory required to store an array when it has many repeated values.

Create a sample string array.

state = [repmat("MA",25,1);repmat("NY",25,1); ...
         repmat("CA",50,1); ...
         repmat("MA",25,1);repmat("NY",25,1)];

Display information about the variable state.

whos state
  Name         Size            Bytes  Class     Attributes

  state      150x1              8212  string              

Convert state to a categorical array.

stateCats = categorical(state);

Display the discrete categories in the variable stateCats.

categories(stateCats)
ans = 3x1 cell
    {'CA'}
    {'MA'}
    {'NY'}

stateCats contains 150 elements, but only three distinct categories.

Display information about the two variables. There is a significant reduction in the memory required to store the categorical array.

whos state stateCats
  Name             Size            Bytes  Class          Attributes

  state          150x1              8212  string                   
  stateCats      150x1               524  categorical              

See Also

|

Related Examples

More About