Main Content

使用字符串数组创建文字云

此示例说明如何通过将纯文本读入字符串数组、进行预处理并传递给 wordcloud 函数,使用纯文本创建文字云。如果您安装了 Text Analytics Toolbox™,则可以直接使用字符串数组创建文字云。有关详细信息,请参阅 wordcloud (Text Analytics Toolbox) (Text Analytics Toolbox)。

使用 fileread 函数从莎士比亚的十四行诗中读取文本。

sonnets = fileread('sonnets.txt');
sonnets(1:135)
ans = 
    'THE SONNETS
     
     by William Shakespeare
     
     
     
     
       I
     
       From fairest creatures we desire increase,
       That thereby beauty's rose might never die,'

使用 string 函数将文本转换为字符串。然后,使用 splitlines 函数按换行符对其进行拆分。

sonnets = string(sonnets);
sonnets = splitlines(sonnets);
sonnets(10:14)
ans = 5×1 string
    "  From fairest creatures we desire increase,"
    "  That thereby beauty's rose might never die,"
    "  But as the riper should by time decease,"
    "  His tender heir might bear his memory:"
    "  But thou, contracted to thine own bright eyes,"

用空格替换一些标点字符。

p = ["." "?" "!" "," ";" ":"];
sonnets = replace(sonnets,p," ");
sonnets(10:14)
ans = 5×1 string
    "  From fairest creatures we desire increase "
    "  That thereby beauty's rose might never die "
    "  But as the riper should by time decease "
    "  His tender heir might bear his memory "
    "  But thou  contracted to thine own bright eyes "

sonnets 拆分为其元素包含单个单词的字符串数组。要完成此操作,需要将所有字符串元素合并成一个 1×1 字符串,然后在空白字符处进行拆分。

sonnets = join(sonnets);
sonnets = split(sonnets);
sonnets(7:12)
ans = 6×1 string
    "From"
    "fairest"
    "creatures"
    "we"
    "desire"
    "increase"

删除少于五个字符的单词。

sonnets(strlength(sonnets)<5) = [];

sonnets 转换为分类数组,然后使用 wordcloud 进行绘图。此函数绘制 C 的唯一元素,大小与这些元素的频率计数对应。

C = categorical(sonnets);
figure
wordcloud(C);
title("Sonnets Word Cloud")

Figure contains an object of type wordcloud. The chart of type wordcloud has title Sonnets Word Cloud.

另请参阅

|