MATLAB String Arrays

MATLAB String Arrays

Use string arrays for efficiently representing text in MATLAB®, Simulink®, and Stateflow®. New task-oriented functions let you easily manipulate and compare text.

Represent Text

Introduced in R2016b, string arrays are a new way to represent text in MATLAB. They are designed and optimized specifically for working with and manipulating text. You can index into, reshape, and concatenate string arrays using standard array operations.

Starting in R2017a, you can use double-quotes to create string arrays. You can use string arrays for data, properties, and name-value pair arguments nearly everywhere in MathWorks products as of R2018b.

Storing text in string arrays.

Storing text in string arrays.

Manipulate Text

Text manipulation functions including startsWith, contains, and insertAfter are concise, descriptive, and task-focused. You can write more efficient, readable, and maintainable code using over a dozen convenient operations. These functions also work with character vectors (char) and cell arrays of character vectors (cell).

Using text manipulation functions to work with text.

Using text manipulation functions to work with text.

Analyze and Model Text

Text Analytics Toolbox™ builds on string arrays with algorithms and visualizations for preprocessing, analyzing, and modeling text data. Models created with the toolbox can be used in applications such as sentiment analysis, predictive maintenance, and topic modeling. You can extract text from popular file formats such as PDF and Microsoft® Word® files, preprocess raw text, extract individual words, and build statistical models.

Analyzing text with Text Analytics Toolbox.

Analyzing text with Text Analytics Toolbox.

Code Compatibility

MathWorks is taking the following steps to address code compatibility from release to release:

  • The use of char and cell will be supported indefinitely.

    The use of character vectors (char) and cell arrays of character vectors (cell) will continue to be supported in both MATLAB and Simulink. Functions and properties continue to accept and return them where they have in the past.

  • Functions and properties will continue to return the same text type.

    Functions and properties introduced prior to R2018b will continue to return the text type they always have (either character vectors or cell arrays of character vectors). However, functions such as replaceBetween or join are an exception to this. These functions output the same text format as the input, enabling you to use these new text manipulation functions regardless of the text type you are using (string arrays, character vectors, or cell arrays of character vectors).

  • String arrays can be indexed using curly braces.

    String arrays index in the same way as other MATLAB arrays. When you index a string array with parentheses () you get back a new string array. When you index a cell array with parentheses () you get back a new cell array. However, to access character vectors in a cell array, most code uses curly brace indexing. Curly brace indexing has been added to string arrays, and it is designed to return character vectors in order to be compatible with this cell array behavior.

If you maintain code for others, you should update your code to accept strings as well. Learn how from the following blog post: Accept string inputs in your code.

When Not to Use String Arrays

You can use string arrays for text data nearly everywhere in MATLAB code, but they are not intended to be used for:

  • Cell Arrays of Only Scalar Strings

    In R2018b, we recommend that you use string arrays instead of cell arrays of character vectors to represent text. If you choose to use a cell array containing only text with functions such as lower, use them with the character vectors these functions expect. Avoid using these functions with cell arrays of strings.
    Use:             >> lower(["Run1" "Run2" "Run3"])
    Or use:        >> lower({'Run1' 'Run2' 'Run3'})
    Do not use: >> lower({"Run1" "Run2" "Run3"})

  • Command Form

    When used in command form, functions such as save, cd, and addpath continue to parse double quotes as text rather than as a delimiter. This behavior has been maintained to prevent code incompatibilities with commands such as mex.

    Use:                  >> load 'my data.csv'
    Do not use:      >> load "my data.csv"

See the string arrays documentation to learn more about the key differences between string arrays and character arrays and how to troubleshoot unexpected results.