splitParagraphs
Description
splits a single newDocuments
= splitParagraphs(document
)tokenizedDocument
object into a
tokenizedDocument
array of paragraphs.
Examples
Split String into Paragraphs
Extract the text from the file exampleParagraphs.txt
.
str = extractFileText("exampleParagraphs.txt")
str = "This example file contains three paragraphs. The first paragraph contains three sentences. The third sentence is short. The second paragraph contains one sentence only. The third (and final) paragraph has seventeen words in total. The final sentence concludes the example file. "
Split the text into paragraphs.
paragraphs = splitParagraphs(str)
paragraphs = 3x1 string
"This example file contains three paragraphs. The first paragraph contains three sentences. The third sentence is short."
"The second paragraph contains one sentence only."
"The third (and final) paragraph has seventeen words in total. The final sentence concludes the example file...."
Split Document into Paragraphs
Extract the text from the file exampleParagraphs.txt
and tokenize it.
str = extractFileText("exampleParagraphs.txt");
document = tokenizedDocument(str)
document = tokenizedDocument: 49 tokens: This example file contains three paragraphs . The first paragraph contains three sentences . The third sentence is short . The second paragraph contains one sentence only . The third ( and final ) paragraph has seventeen words in total . The final sentence concludes the example file .
Split the document into paragraphs.
paragraphs = splitParagraphs(document)
paragraphs = 3x1 tokenizedDocument: 20 tokens: This example file contains three paragraphs . The first paragraph contains three sentences . The third sentence is short . 8 tokens: The second paragraph contains one sentence only . 21 tokens: The third ( and final ) paragraph has seventeen words in total . The final sentence concludes the example file .
Input Arguments
str
— Input text
string scalar | character vector | scalar cell array containing a character vector
Input text, specified as a string scalar, a character vector, or a scalar cell array containing a character vector.
Data Types: string
| char
| cell
document
— Input document
scalar tokenizedDocument
object
Input document, specified as a scalar tokenizedDocument
object.
Output Arguments
newStr
— Output text
string array | cell array of character vectors
Output text, returned as a string array or cell array of character vectors.
If str
is a string, then newStr
is a string.
Otherwise, newStr
is a cell array of character vectors.
Data Types: string
| cell
newDocuments
— Output documents
tokenizedDocument
array
Output documents, returned as a tokenizedDocument
array.
Version History
Introduced in R2023a
MATLAB Command
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)
Asia Pacific
- Australia (English)
- India (English)
- New Zealand (English)
- 中国
- 日本Japanese (日本語)
- 한국Korean (한국어)