Main Content

splitParagraphs

Split text into paragraphs

Since R2023a

    Description

    newStr = splitParagraphs(str) splits str into an array of paragraphs.

    example

    newDocuments = splitParagraphs(document) splits a single tokenizedDocument object into a tokenizedDocument array of paragraphs.

    Examples

    collapse all

    Extract the text from the file exampleParagraphs.txt.

    str = extractFileText("exampleParagraphs.txt")
    str = 
        "This example file contains three paragraphs. The first paragraph contains three sentences. The third sentence is short.
         
         The second paragraph contains one sentence only.
         
         The third (and final) paragraph has seventeen words in total. The final sentence concludes the example file.
         "
    
    

    Split the text into paragraphs.

    paragraphs = splitParagraphs(str)
    paragraphs = 3x1 string
        "This example file contains three paragraphs. The first paragraph contains three sentences. The third sentence is short."
        "The second paragraph contains one sentence only."
        "The third (and final) paragraph has seventeen words in total. The final sentence concludes the example file...."
    
    

    Extract the text from the file exampleParagraphs.txt and tokenize it.

    str = extractFileText("exampleParagraphs.txt");
    document = tokenizedDocument(str)
    document = 
      tokenizedDocument:
    
       49 tokens: This example file contains three paragraphs . The first paragraph contains three sentences . The third sentence is short . The second paragraph contains one sentence only . The third ( and final ) paragraph has seventeen words in total . The final sentence concludes the example file .
    
    

    Split the document into paragraphs.

    paragraphs = splitParagraphs(document)
    paragraphs = 
      3x1 tokenizedDocument:
    
        20 tokens: This example file contains three paragraphs . The first paragraph contains three sentences . The third sentence is short .
         8 tokens: The second paragraph contains one sentence only .
        21 tokens: The third ( and final ) paragraph has seventeen words in total . The final sentence concludes the example file .
    
    

    Input Arguments

    collapse all

    Input text, specified as a string scalar, a character vector, or a scalar cell array containing a character vector.

    Data Types: string | char | cell

    Input document, specified as a scalar tokenizedDocument object.

    Output Arguments

    collapse all

    Output text, returned as a string array or cell array of character vectors.

    If str is a string, then newStr is a string. Otherwise, newStr is a cell array of character vectors.

    Data Types: string | cell

    Output documents, returned as a tokenizedDocument array.

    Version History

    Introduced in R2023a