Create a table of the words with highest probability of an LDA topic.
To reproduce the results, set rng
to 'default'
.
Load the example data. The file sonnetsPreprocessed.txt
contains preprocessed versions of Shakespeare's sonnets. The file contains one sonnet per line, with words separated by a space. Extract the text from sonnetsPreprocessed.txt
, split the text into documents at newline characters, and then tokenize the documents.
Create a bag-of-words model using bagOfWords
.
Fit an LDA model with 20 topics. To suppress verbose output, set 'Verbose'
to 0.
Find the top 20 words of the first topic.
tbl=20×2 table
Word Score
________ _________
"eyes" 0.11155
"beauty" 0.05777
"hath" 0.055778
"still" 0.049801
"true" 0.043825
"mine" 0.033865
"find" 0.031873
"black" 0.025897
"look" 0.023905
"tis" 0.023905
"kind" 0.021913
"seen" 0.021913
"found" 0.017929
"sin" 0.015937
"three" 0.013945
"golden" 0.0099608
⋮
Find the top 20 words of the first topic and use inverse mean scaling on the scores.
tbl=20×2 table
Word Score
________ ________
"eyes" 1.2718
"beauty" 0.59022
"hath" 0.5692
"still" 0.50269
"true" 0.43719
"mine" 0.32764
"find" 0.32544
"black" 0.25931
"tis" 0.23755
"look" 0.22519
"kind" 0.21594
"seen" 0.21594
"found" 0.17326
"sin" 0.15223
"three" 0.13143
"golden" 0.090698
⋮
Create a word cloud using the scaled scores as the size data.