主要内容

formatTextChunks

Create Markdown-formatted text from text chunks

Since R2026a

    Description

    str = formatTextChunks(chunkTable) creates Markdown-formatted text str from a table of text chunks chunkTable.

    example

    Examples

    collapse all

    Split an HTML document into sections using the splitHTMLSections. Specify the HTML code as a string.

    str = "<html><body><head><title>Title</title></head>" + ...
        "<h1>Chapter 1</h1><p>Introductory paragraph of chapter 1.</p>" + ...
        "<h2>Section 1</h2><p>Content of section 1.</p>" + ...
        "<h2>Section 2</h2><p>Content of section 2.</p></body></html>";
    chunkTable = splitHTMLSections(str)
    chunkTable=4×3 table
                             Text                              H1             H2     
        _______________________________________________    ___________    ___________
    
        "<html><body><head><title>Title</title></head>"    <missing>      <missing>  
        "<p>Introductory paragraph of chapter 1.</p>"      "Chapter 1"    <missing>  
        "<p>Content of section 1.</p>"                     "Chapter 1"    "Section 1"
        "<p>Content of section 2.</p></body></html>"       "Chapter 1"    "Section 2"
    
    

    Specify the text chunk indices.

    idx = [3 4];

    Create Markdown-formatted text from the text chunks at the specified indices using the formatTextChunks function.

    str =  formatTextChunks(chunkTable(idx,:))
    str = 
        "# Chapter 1
         ## Section 1
         
         <p>Content of section 1.</p>
         ## Section 2
         
         <p>Content of section 2.</p></body></html>"
    
    

    Load the example data. The file sonnets.txt contains Shakespeare's sonnets in plain text. Extract the text from sonnets.txt using the extractFileText function.

    str = extractFileText("sonnets.txt");

    Split str into text chunks using the splitTextChunks function. Specify the target length as 100.

    chunkTable = splitTextChunks(str,TargetLength=100);

    Specify the index of the text chunk from which to create an LLM prompt.

    idx = 10;

    Surrounding text chunks can provide useful context. For this example, find the indices of context text chunks for the target chunk at index 10 using the findTextChunkContext function. Specify a target length of 300 characters. The function returns the indices of the target chunk and surrounding context chunks, such that the combined length of the target and context chunks is 300 characters or less.

    idxContext = findTextChunkContext(chunkTable,idx,TargetLength=300)
    idxContext = 1×4
    
         8     9    10    11
    
    

    Create a Markdown-formatted prompt from the text chunks using the formatTextChunks function.

    prompt = formatTextChunks(chunkTable(idxContext,:));
    prompt = "The retrieved information is: " + prompt
    prompt = 
        "The retrieved information is: by the grave and thee.
         
         II
         
         When forty winters shall besiege thy brow, And dig deep trenches in thy beauty's field, Thy youth's
         
         proud livery so gazed on now, Will be a tatter'd weed of small worth held: Then being asked, where"
    
    

    You can then provide this prompt to an LLM to generate text. To connect to LLM APIs from MATLAB, use the Large Language Models (LLMs) with MATLAB add-on.

    Input Arguments

    collapse all

    Input table of text chunks. chunkTable must contain a variable named Text, specified as a string scalar that contains the text chunks.

    Create a table of text chunks from a document or table of documents by using the splitTextChunks, splitHTMLSections, splitMarkdownSections, or splitMarkdownSections function.

    Output Arguments

    collapse all

    Markdown-formatted text chunks, returned as a string.

    More About

    collapse all

    Version History

    Introduced in R2026a