Main Content

splitSentences

Split text into sentences

Description

example

newStr = splitSentences(str) splits str into an array of sentences.

newDocuments = splitSentences(document) splits a single tokenizedDocument object into a tokenizedDocument array of sentences.

Examples

collapse all

Read the text from the example file sonnets.txt and split it into sentences.

filename = "sonnets.txt";
str = extractFileText(filename);
sentences = splitSentences(str);

View the first few sentences.

sentences(1:10)
ans = 10x1 string
    "THE SONNETS"
    "by William Shakespeare"
    "I"
    "From fairest creatures we desire increase,..."
    "II"
    "When forty winters shall besiege thy brow,..."
    "How much more praise deserv'd thy beauty's use,..."
    "This were to be new made when thou art old,..."
    "III"
    "Look in thy glass and tell the face thou viewest..."

Input Arguments

collapse all

Input text, specified as a string scalar, a character vector, or a scalar cell array containing a character vector.

Data Types: string | char | cell

Input document, specified as a scalar tokenizedDocument object.

Output Arguments

collapse all

Output text, returned as a string array or cell array of character vectors.

If str is a string, then newStr is a string. Otherwise, newStr is a cell array of character vectors.

Data Types: string | cell

Output documents, returned as a tokenizedDocument array.

Algorithms

If emoticons or emoji characters appear after a terminating punctuation character, then the function splits the sentence after the emoticons and emoji.

Version History

Introduced in R2018a