Main Content

Correct Spelling in Documents

This example shows how to correct spelling in documents using Hunspell.

Load Text Data

Create an array of tokenized documents.

str = [
    "Use MATLAB to correct spelling of words."
    "Correctly spelled worrds are important for lemmatization."
    "Text Analytics Toolbox providesfunctions for spelling correction."];
documents = tokenizedDocument(str)
documents = 
  3x1 tokenizedDocument:

    8 tokens: Use MATLAB to correct spelling of words .
    8 tokens: Correctly spelled worrds are important for lemmatization .
    8 tokens: Text Analytics Toolbox providesfunctions for spelling correction .

Correct Spelling

Correct the spelling of the documents using the correctSpelling function.

updatedDocuments = correctSpelling(documents)
updatedDocuments = 
  3x1 tokenizedDocument:

    9 tokens: Use MAT LAB to correct spelling of words .
    8 tokens: Correctly spelled words are important for solemnization .
    9 tokens: Text Analytic Toolbox provides functions for spelling correction .

Notice that:

  • The input word "MATLAB" has been split into the two words "MAT" and "LAB".

  • The input word "worrds" has been changed to "words".

  • The input word "lemmatization" has been changed to "solemnization".

  • The input word "Analytics" has been changed to "Analytic".

  • The input word "providesfunctions" has been split into the two words "provides" and "functions".

Specify Custom Words

To prevent the software from updating particular words, you can provide a list of known words using the 'KnownWords' option of the correctSpelling function.

Correct the spelling of the documents again and specify the words "MATLAB", "Analytics", and "lemmatization" as known words.

updatedDocuments = correctSpelling(documents,'KnownWords',["MATLAB" "Analytics" "lemmatization"])
updatedDocuments = 
  3x1 tokenizedDocument:

    8 tokens: Use MATLAB to correct spelling of words .
    8 tokens: Correctly spelled words are important for lemmatization .
    9 tokens: Text Analytics Toolbox provides functions for spelling correction .

Notice here that the words "MATLAB", "Analytics", and "lemmatization" remain unchanged.

See Also

|

Related Topics