formatTextChunks
Description
creates Markdown-formatted text str = formatTextChunks(chunkTable)str from a table of text chunks
chunkTable.
Examples
Split an HTML document into sections using the splitHTMLSections. Specify the HTML code as a string.
str = "<html><body><head><title>Title</title></head>" + ... "<h1>Chapter 1</h1><p>Introductory paragraph of chapter 1.</p>" + ... "<h2>Section 1</h2><p>Content of section 1.</p>" + ... "<h2>Section 2</h2><p>Content of section 2.</p></body></html>"; chunkTable = splitHTMLSections(str)
chunkTable=4×3 table
Text H1 H2
_______________________________________________ ___________ ___________
"<html><body><head><title>Title</title></head>" <missing> <missing>
"<p>Introductory paragraph of chapter 1.</p>" "Chapter 1" <missing>
"<p>Content of section 1.</p>" "Chapter 1" "Section 1"
"<p>Content of section 2.</p></body></html>" "Chapter 1" "Section 2"
Specify the text chunk indices.
idx = [3 4];
Create Markdown-formatted text from the text chunks at the specified indices using the formatTextChunks function.
str = formatTextChunks(chunkTable(idx,:))
str =
"# Chapter 1
## Section 1
<p>Content of section 1.</p>
## Section 2
<p>Content of section 2.</p></body></html>"
Load the example data. The file sonnets.txt contains Shakespeare's sonnets in plain text. Extract the text from sonnets.txt using the extractFileText function.
str = extractFileText("sonnets.txt");Split str into text chunks using the splitTextChunks function. Specify the target length as 100.
chunkTable = splitTextChunks(str,TargetLength=100);
Specify the index of the text chunk from which to create an LLM prompt.
idx = 10;
Surrounding text chunks can provide useful context. For this example, find the indices of context text chunks for the target chunk at index 10 using the findTextChunkContext function. Specify a target length of 300 characters. The function returns the indices of the target chunk and surrounding context chunks, such that the combined length of the target and context chunks is 300 characters or less.
idxContext = findTextChunkContext(chunkTable,idx,TargetLength=300)
idxContext = 1×4
8 9 10 11
Create a Markdown-formatted prompt from the text chunks using the formatTextChunks function.
prompt = formatTextChunks(chunkTable(idxContext,:));
prompt = "The retrieved information is: " + promptprompt =
"The retrieved information is: by the grave and thee.
II
When forty winters shall besiege thy brow, And dig deep trenches in thy beauty's field, Thy youth's
proud livery so gazed on now, Will be a tatter'd weed of small worth held: Then being asked, where"
You can then provide this prompt to an LLM to generate text. To connect to LLM APIs from MATLAB, use the Large Language Models (LLMs) with MATLAB add-on.
Input Arguments
Input table of text chunks. chunkTable must contain a variable
named Text, specified as a string scalar that contains the text
chunks.
Create a table of text chunks from a document or table of documents by using the
splitTextChunks, splitHTMLSections, splitMarkdownSections, or
splitMarkdownSections function.
Output Arguments
Markdown-formatted text chunks, returned as a string.
More About
Many analysis tools, including large language models (LLMs), perform better on small chunks of text than on large documents. Text Analytics Toolbox™ includes a range of functions that allow you to split large documents into semantically meaningful chunks.
The splitTextChunks function splits a document recursively into text chunks
of a given target length. The function first splits a document into paragraphs. If any
of the paragraphs are longer than the target length, then the function splits those
paragraphs into sentences, and so on.
chunks = splitTextChunks(str);
Split your document into sections and preserve the section metadata using one of these functions:
splitHTMLSectionsSplit an HTML-formatted document into HTML sections according to the section tags
<h1>...</h1>,<h2>...</h2>, …,<h6>...</h6>.splitMarkdownSectionsSplit a Markdown-formatted document into Markdown sections, for example according to ATX section tags #,##, …,######.splitCustomSectionsSplit a document into custom sections according to custom section delimiters. Split your documents or your chunks recursively into paragraphs, sentences, and tokens using the
splitTextChunksfunction.To avoid redundancy, join similar adjacent chunks using the
joinSimilarTextChunksfunction.Add overlap between adjacent text chunks using the
addTextChunkOverlapfunction. Adding text chunk overlap avoids changing the meaning of sentences by splitting at inopportune points, for example, splitting the sentence "I would never say I love cats" into "I would never say" and "I love cats." Adding overlap in this example results in the two chunks "I would never say I love" and "never say I love cats." You can also add surrounding text to individual chunks as context by using thefindTextChunkContextfunction.
For an example showing the advanced workflow, see Split Document Into Semantically Meaningful Text Chunks.
RAG combines the text generation capabilities of large language models (LLMs) with reliable information contained in a set of source documents. First, retrieve documents relevant to the user prompt from the set of source documents. Then, append the relevant document to the prompt and use the LLM to generate a response.
To improve the quality of the generated output, split large documents into smaller, semantically meaningful chunks.
Use information retrieval to identify the text chunks that are relevant to the query. For more information, see Information Retrieval with Document Embeddings.
Create a prompt based on the most relevant chunks. To provide the LLM with additional context, you can add text from adjacent prompts within the same section by using the
findTextChunkContextfunction, or you can you can add overlap between text chunks before information retrieval by using theaddTextChunkOverlapfunction. Create a Markdown-formatted string from the text chunks using theformatTextChunksfunction. For an example, see Create Large Language Model (LLM) Prompt from Text Chunk.Generate an answer using an LLM. To connect to large language model APIs using MATLAB, use the Large Language Models (LLMs) with MATLAB add-on.
Version History
Introduced in R2026a
MATLAB Command
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
选择网站
选择网站以获取翻译的可用内容,以及查看当地活动和优惠。根据您的位置,我们建议您选择:。
您也可以从以下列表中选择网站:
如何获得最佳网站性能
选择中国网站(中文或英文)以获得最佳网站性能。其他 MathWorks 国家/地区网站并未针对您所在位置的访问进行优化。
美洲
- América Latina (Español)
- Canada (English)
- United States (English)
欧洲
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)