Main Content

vaderSentimentScores

Sentiment scores with VADER algorithm

Description

Use vaderSentimentScores to evaluate sentiment in tokenized text with the Valence Aware Dictionary and sEntiment Reasoner (VADER) algorithm. The vaderSentimentScores function uses, by default, the VADER sentiment lexicon and modifier word lists.

The function supports English text only.

compoundScores = vaderSentimentScores(documents) returns sentiment scores for tokenized documents. The function calculates the compound scores by aggregating individual token scores, adjusted according to the algorithm rules and then normalized between -1 and 1. The function discards all tokens with a single character, not present in the sentiment lexicon.

example

compoundScores = vaderSentimentScores(documents,Name,Value) specifies additional options using one or more name-value pairs.

[compoundScores,positiveScores,negativeScores,neutralScores] = vaderSentimentScores(___) also returns the ratios for proportions of the documents which are positive, negative, and neutral, respectively, using any of the previous syntaxes.

example

Examples

collapse all

Create a tokenized document.

str = [
    "The book was VERY good!!!!"
    "The book was not very good."];
documents = tokenizedDocument(str);

Evaluate the sentiment of the tokenized documents. Scores close to 1 indicate positive sentiment, scores close to -1 indicate negative sentiment, and scores close to 0 indicate neutral sentiment.

compoundScores = vaderSentimentScores(documents)
compoundScores = 2×1

    0.7264
   -0.3865

Sentiment analysis algorithms such as VADER rely on annotated lists of words called sentiment lexicons. For example, VADER uses a sentiment lexicon with words annotated with a sentiment score ranging from -4 to 4, where scores close to 4 indicate strong positive sentiment, scores close to -4 indicate strong negative sentiment, and scores close to zero indicate neutral sentiment.

To analyze the sentiment of text using the VADER algorithm, use the vaderSentimentScores function. If the sentiment lexicon used by the vaderSentimentScores function does not suit the data you are analyzing, for example, if you have a domain-specific data set like medical or engineering data, then you can use your own custom sentiment lexicon. For an example showing how to generate a domain specific sentiment lexicon, see Generate Domain Specific Sentiment Lexicon.

Create a tokenized document array containing the text data to analyze.

textData = [ 
    "This company is showing extremely strong growth."
    "This other company is accused of misleading consumers."];
documents = tokenizedDocument(textData);

Load the example domain specific lexicon for finance data.

filename = "financeSentimentLexicon.csv";
tbl = readtable(filename);
head(tbl)
        Token         SentimentScore
    ______________    ______________

    {'innovative'}             4    
    {'greater'   }        3.6216    
    {'efficiency'}        3.5971    
    {'enhance'   }        3.5628    
    {'better'    }        3.5532    
    {'creative'  }        3.5358    
    {'strengthen'}        3.5161    
    {'improved'  }         3.484    

Evaluate the sentiment using the vaderSentimentScores function and specify the custom sentiment lexicon using the 'SentimentLexicon' option. Scores close to 1 indicate positive sentiment, scores close to -1 indicate negative sentiment, and scores close to 0 indicate neutral sentiment.

compoundScores = vaderSentimentScores(documents,'SentimentLexicon',tbl)
compoundScores = 2×1

    0.8762
   -0.1176

Input Arguments

collapse all

Input documents, specified as a tokenizedDocument array.

Name-Value Arguments

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Before R2021a, use commas to separate each name and value, and enclose Name in quotes.

Example: 'Boosters',["verry" "verrry"] specifies to use the boosters "verry" and "verrrry".

Sentiment lexicon, specified as a table with these variables:

  • Token – Token, specified as a string scalar. The tokens must be lowercase.

  • SentimentScore – Sentiment score of token, specified as a numeric scalar in the range [-4, 4], where scores close to -4 indicate strong negative sentiment, scores close to 4 indicate strong positive sentiment, and scores close to 0 indicate neutral sentiment.

When evaluating sentiment, the software, by default, ignores tokens with one character and replaces emojis with an equivalent textual description before computing the sentiment scores. For example, the software replaces instances of the emoji "😀" with the text "grinning face" and then evaluates the sentiment scores. If you provide tokens with one character or emojis with corresponding sentiment scores in SentimentLexicon, then the function does not remove or replace these tokens.

The default sentiment lexicon is the VADER sentiment lexicon.

Data Types: table

List of booster words or n-grams, specified as a string array.

The function uses booster n-grams to boost the sentiment of subsequent tokens. For example, words like "absolutely" and "amazingly".

For a list of words, the list must be a column vector. For a list of n-grams, the list has size NumNgrams-by-maxN , where NumNgrams is the number of n-grams, and maxN is the length of the largest n-gram. The (i,j)th element of the list is the jth word of the ith n-gram. If the number of words in the ith n-gram is less than maxN, then the remaining entries of the ith row of the list are empty.

The booster n-grams must be lowercase.

The default list of booster n-grams is the VADER booster list.

Data Types: string

List of dampener words or n-grams, specified as a string array.

The function uses dampener n-grams to dampen the sentiment of subsequent tokens. For example, words like "hardly" and "somewhat".

For a list of words, the list must be a column vector. For a list of n-grams, the list has size NumNgrams-by-maxN , where NumNgrams is the number of n-grams, and maxN is the length of the largest n-gram. The (i,j)th element of the list is the jth word of the ith n-gram. If the number of words in the ith n-gram is less than maxN, then the remaining entries of the ith row of the list are empty.

The dampener n-grams must be lowercase.

The default list of dampener n-grams is the VADER dampener list.

Data Types: string

List of negation words, specified as a string array.

The function uses negation words to negate the sentiment of subsequent tokens. For example, words like "not" and "isn't".

The negation words must be lowercase.

The default list of negation words is the VADER negation list.

Data Types: string

Output Arguments

collapse all

Compound sentiment scores, returned as a numeric vector. The function returns one score for each input document. The value compoundScores(i) corresponds to the compound sentiment score of documents(i).

The function determines the compound scores by aggregating individual token scores, adjusts them according to the VADER algorithm rules, and then normalizes them between -1 and 1.

Positive sentiment scores, returned as a numeric vector. The function returns one score for each input document. The value positiveScores(i) corresponds to the positive sentiment score of documents(i).

Negative sentiment scores, returned as a numeric vector. The function returns one score for each input document. The value negativeScores(i) corresponds to the negative sentiment score of documents(i).

Neutral sentiment scores, returned as a numeric vector. The function returns one score for each input document. The value neutralScores(i) corresponds to the neutral sentiment score of documents(i).

References

[1] Hutto, C., and Eric Gilbert. “VADER: A Parsimonious Rule-Based Model for Sentiment Analysis of Social Media Text.” Proceedings of the International AAAI Conference on Web and Social Media 8, no. 1 (May 16, 2014): 216–25. https://doi.org/10.1609/icwsm.v8i1.14550.

Version History

Introduced in R2019b