使用字符串数组创建文字云
此示例说明如何通过将纯文本读入字符串数组、进行预处理并传递给 wordcloud
函数,使用纯文本创建文字云。如果您安装了 Text Analytics Toolbox™,则可以直接使用字符串数组创建文字云。有关详细信息,请参阅 wordcloud
(Text Analytics Toolbox) (Text Analytics Toolbox)。
使用 fileread
函数从莎士比亚的十四行诗中读取文本。
sonnets = fileread('sonnets.txt');
sonnets(1:135)
ans = 'THE SONNETS by William Shakespeare I From fairest creatures we desire increase, That thereby beauty's rose might never die,'
使用 string
函数将文本转换为字符串。然后,使用 splitlines
函数按换行符对其进行拆分。
sonnets = string(sonnets); sonnets = splitlines(sonnets); sonnets(10:14)
ans = 5×1 string
" From fairest creatures we desire increase,"
" That thereby beauty's rose might never die,"
" But as the riper should by time decease,"
" His tender heir might bear his memory:"
" But thou, contracted to thine own bright eyes,"
用空格替换一些标点字符。
p = ["." "?" "!" "," ";" ":"]; sonnets = replace(sonnets,p," "); sonnets(10:14)
ans = 5×1 string
" From fairest creatures we desire increase "
" That thereby beauty's rose might never die "
" But as the riper should by time decease "
" His tender heir might bear his memory "
" But thou contracted to thine own bright eyes "
将 sonnets
拆分为其元素包含单个单词的字符串数组。要完成此操作,需要将所有字符串元素合并成一个 1×1 字符串,然后在空白字符处进行拆分。
sonnets = join(sonnets); sonnets = split(sonnets); sonnets(7:12)
ans = 6×1 string
"From"
"fairest"
"creatures"
"we"
"desire"
"increase"
删除少于五个字符的单词。
sonnets(strlength(sonnets)<5) = [];
将 sonnets
转换为分类数组,然后使用 wordcloud
进行绘图。此函数绘制 C
的唯一元素,大小与这些元素的频率计数对应。
C = categorical(sonnets);
figure
wordcloud(C);
title("Sonnets Word Cloud")