Visualize LDA Topic Correlations
This example shows how to analyze correlations between topics in a Latent Dirichlet Allocation (LDA) topic model.
A latent Dirichlet allocation (LDA) model is a topic model which discovers underlying topics in a collection of documents and infers word probabilities in topics. The vectors of per-topic word probabilities characterize the topics. Using the per-topic word probabilities, you can identify correlations between the topics.
Load LDA Model
Load the LDA model factoryReportsLDAModel
which is trained using a data set of factory reports detailing different failure events. For an example showing how to fit an LDA model to a collection of text data, see Analyze Text Data Using Topic Models.
load factoryReportsLDAModel
mdl
mdl = ldaModel with properties: NumTopics: 7 WordConcentration: 1 TopicConcentration: 0.5755 CorpusTopicProbabilities: [0.1587 0.1573 0.1551 0.1534 0.1340 0.1322 0.1093] DocumentTopicProbabilities: [480×7 double] TopicWordProbabilities: [158×7 double] Vocabulary: ["item" "occasionally" "get" "stuck" "scanner" "spool" "loud" "rattling" "sound" "come" "assembler" "piston" "cut" "power" "start" "plant" "capacitor" "mixer" … ] TopicOrder: 'initial-fit-probability' FitInfo: [1×1 struct]
Visualize the topics using word clouds.
numTopics = mdl.NumTopics; figure t = tiledlayout("flow"); title(t,"LDA Topics") for i = 1:numTopics nexttile wordcloud(mdl,i); title("Topic " + i) end
Visualize Topic Correlations
Calculate the correlations between the topics using the corrcoef
function with the LDA model topic word probabilities as input.
correlation = corrcoef(mdl.TopicWordProbabilities);
View the correlations in a heat map and label each topic with its top three words. To prevent the heat map from highlighting the trivial correlations between topics each and itself, subtract the identity matrix from the correlations.
For each topic, find the top three words.
numTopics = mdl.NumTopics; for i = 1:numTopics top = topkwords(mdl,3,i); topWords(i) = join(top.Word,", "); end
Plot the correlations using the heatmap
function.
figure heatmap(correlation - eye(numTopics), ... XDisplayLabels=topWords, ... YDisplayLabels=topWords) title("LDA Topic Correlations") xlabel("Topic") ylabel("Topic")
For each topic, find the topic with the strongest correlation and display the pairs in a table with the corresponding correlation coefficient.
[topCorrelations,topCorrelatedTopics] = max(correlation - eye(numTopics)); tbl = table; tbl.TopicIndex = (1:numTopics)'; tbl.Topic = topWords'; tbl.TopCorrelatedTopicIndex = topCorrelatedTopics'; tbl.TopCorrelatedTopic = topWords(topCorrelatedTopics)'; tbl.CorrelationCoefficient = topCorrelations'
tbl=7×5 table
TopicIndex Topic TopCorrelatedTopicIndex TopCorrelatedTopic CorrelationCoefficient
__________ ______________________________ _______________________ ______________________________ ______________________
1 "mixer, sound, assembler" 5 "mixer, fuse, coolant" 0.34304
2 "scanner, agent, stuck" 4 "scanner, appear, spool" 0.34526
3 "sound, agent, hear" 1 "mixer, sound, assembler" 0.26909
4 "scanner, appear, spool" 2 "scanner, agent, stuck" 0.34526
5 "mixer, fuse, coolant" 1 "mixer, sound, assembler" 0.34304
6 "arm, robot, smoke" 1 "mixer, sound, assembler" 0.0042125
7 "software, sorter, controller" 7 "software, sorter, controller" 0
See Also
tokenizedDocument
| fitlda
| ldaModel
| wordcloud