Main Content

plotDependence

Plot dependence of Shapley values on predictor values

Since R2024b

    Description

    plotDependence(explainer,predictor) returns a dependence plot for the predictor specified by predictor and the Shapley values in the shapley object explainer. The plot contains Shapley values for the query points in explainer.QueryPoints.

    • If predictor specifies a categorical predictor (explainer.CategoricalPredictors), then the function displays a box plot of the corresponding Shapley values for each category. Each box plot displays: the median, the lower and upper quartiles, any outliers (computed using the interquartile range), and the minimum and maximum values that are not outliers.

    • If predictor specifies a noncategorical predictor, then the function displays a scatter plot of the corresponding Shapley values.

    If explainer.BlackboxModel is a classification model, the function displays a plot for class explainer.BlackboxModel.ClassNames(1) by default.

    example

    plotDependence(explainer,predictor,Name=Value) specifies additional options using one or more name-value arguments. For example, use color to display a second predictor in the plot by specifying the ColorPredictor name-value argument.

    example

    plotDependence(ax,___) displays the dependence plot in the target axes ax. Specify ax as the first argument in any of the previous syntaxes.

    p = plotDependence(___) returns a Box or Scatter object. Use p to query or modify the properties (BoxChart Properties or Scatter Properties) of an object after you create it.

    Examples

    collapse all

    Train a classification model and create a shapley object. Use the fit object function to compute the Shapley values for the specified query points. Then for each predictor, visualize the dependence of the Shapley values on the predictor values by using the plotDependence object function.

    Load the CreditRating_Historical data set. The data set contains customer IDs and their financial ratios, industry labels, and credit ratings.

    tbl = readtable("CreditRating_Historical.dat");

    Display the first three rows of the table.

    head(tbl,3)
         ID      WC_TA    RE_TA    EBIT_TA    MVE_BVTD    S_TA     Industry    Rating
        _____    _____    _____    _______    ________    _____    ________    ______
    
        62394    0.013    0.104     0.036      0.447      0.142       3        {'BB'}
        48608    0.232    0.335     0.062      1.969      0.281       8        {'A' }
        42444    0.311    0.367     0.074      1.935      0.366       1        {'A' }
    

    Train a blackbox model of credit ratings by using the fitcecoc function. Use the variables from the second through seventh columns in tbl as the predictor variables. A recommended practice is to specify the class names to set the order of the classes.

    blackbox = fitcecoc(tbl,"Rating", ...
        PredictorNames=tbl.Properties.VariableNames(2:7), ...
        CategoricalPredictors="Industry", ...
        ClassNames={'AAA','AA','A','BBB','BB','B','CCC'});

    Create a shapley object that explains the predictions for multiple query points. For faster computation, shapley subsamples 100 observations from the predictor data in blackbox to compute the Shapley values. Specify the sampled observations as the query points in the call to the fit object function.

    rng("default") % For reproducibility
    explainer = shapley(blackbox);
    queryPoints = explainer.X(explainer.SampledObservationIndices,:);
    explainer = fit(explainer,queryPoints);

    Visualize the Shapley values for a specified predictor by using the plotDependence object function.

    predictor = "MVE_BVTD";
    plotDependence(explainer,predictor)

    By default, the function shows the Shapley values for the first class, AAA. For noncategorical predictors, the function displays a scatter plot, where the x-axis corresponds to the predictor values and the y-axis corresponds to the Shapley values for the predictor.

    For class AAA, the Shapley values for the MVE_BVTD predictor tend to increase as the predictor values increase from 0 to 4. For MVE_BVTD values greater than 4, the corresponding Shapley values tend to remain constant (between 1.5 and 2).

    For categorical predictors, plotDependence displays box plots for each category in the categorical predictor. The function determines categorical predictors based on the CategoricalPredictors property of the shapley object.

    Visualize the Shapley values for the categorical predictor Industry. Specify the class.

    class = "A";
    plotDependence(explainer,"Industry",ClassName=class)

    For class A, the distribution of the Shapley values varies across different industries. For example, industry 3 has exclusively positive Shapley values, whereas industry 9 has exclusively negative Shapley values.

    Train a regression model and create a shapley object using multiple query points. Then for each predictor, visualize the dependence of the Shapley values on the predictor values. Use color to see the dependence on a second predictor.

    Load the carbig data set, which contains measurements of cars made in the 1970s and early 1980s.

    load carbig

    Create a table containing the predictor variables Acceleration, Cylinders, and so on, as well as the response variable MPG.

    tbl = table(Acceleration,Cylinders,Displacement, ...
        Horsepower,Model_Year,Weight,MPG);

    Removing missing values in a training set helps to reduce memory consumption and speed up training for the fitrkernel function. Remove missing values in tbl.

    tbl = rmmissing(tbl);

    Train a blackbox model of MPG by using the fitrkernel function. Specify the Cylinders and Model_Year variables as categorical predictors. Standardize the remaining predictors.

    mdl = fitrkernel(tbl,"MPG",CategoricalPredictors=[2 5], ...
        Standardize=true);

    Create a shapley object that explains the predictions for multiple query points. Because mdl does not contain training data, specify to compute Shapley values using the predictor data in tbl. For faster computation, specify to subsample 200 observations from tbl. Use all observations in tbl as query points.

    explainer = shapley(mdl,tbl,NumObservationsToSample=200, ...
        QueryPoints=tbl);

    Visualize the Shapley values for a specific predictor by using the plotDependence object function. Use color to display a second predictor. Note that if you want to specify a color predictor, the x-axis predictor must be a noncategorical predictor.

    predictor = "Weight";
    colorPredictor = "Horsepower";
    plotDependence(explainer,predictor,ColorPredictor=colorPredictor)

    Figure contains an axes object. The axes object with title Shapley Dependence Plot, xlabel Weight, ylabel Shapley Values for Weight contains 2 objects of type scatter, constantline.

    For Weight values between 2000 and 4000, the corresponding Shapley values tend to decrease as the Weight values increase. Based on the color of the points in the plot, Horsepower values tend to increase as Weight values increase.

    Input Arguments

    collapse all

    Object explaining the blackbox model, specified as a shapley object. explainer must contain Shapley values; that is, explainer.Shapley must be nonempty.

    Predictor variable to plot, specified as a positive integer scalar, character vector, or string scalar.

    • If you specify a positive integer scalar, it must be the index value corresponding to a column in the predictor data explainer.X.

    • If you specify a character vector or string scalar, it must be the name of a predictor variable. When explainer.BlackboxModel is a machine learning model object, the name must match one of the names in the PredictorNames property of the model (explainer.BlackboxModel.PredictorNames). When explainer.BlackboxModel is a custom model specified as a function handle, the name must match one of the variable names in explainer.X.

    Example: "x1"

    Data Types: single | double | char | string

    Axes for the plot, specified as an Axes object. If you do not specify ax, then plotDependence creates the plot using the current axes. For more information on creating an Axes object, see axes.

    Name-Value Arguments

    Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

    Example: plotDependence(explainer,"x1",ColorPredictor="x3",ColorMap="abyss") creates a scatter plot of Shapley values for the numeric predictor x1 and uses the x3 predictor to color the points with the abyss colormap.

    Class label to plot, specified as a numeric scalar, logical scalar, character vector, string scalar, or categorical scalar. The value and data type of ClassName must match one of the class names in the ClassNames property of the machine learning model in explainer (explainer.BlackboxModel.ClassNames). The software accepts character vectors, string scalars, and categorical scalars interchangeably.

    This argument is valid only when the machine learning model (BlackboxModel) in explainer is a classification model.

    Example: ClassName="AAA"

    Data Types: single | double | logical | char | string | categorical

    Predictor variable to plot using color, specified as a positive integer scalar, character vector, or string scalar.

    • If you specify a positive integer scalar, it must be the index value corresponding to a column in the predictor data explainer.X.

    • If you specify a character vector or string scalar, it must be the name of a predictor variable. When explainer.BlackboxModel is a machine learning model object, the name must match one of the names in the PredictorNames property of the model (explainer.BlackboxModel.PredictorNames). When explainer.BlackboxModel is a custom model specified as a function handle, the name must match one of the variable names in explainer.X.

    For more information on how plotDependence maps color predictor values to the colormap, see Color Assignment for Color Predictor Values.

    This argument is valid only when the variable predictor is not a categorical predictor.

    Example: "x2"

    Data Types: single | double | char | string

    Colormap for the plot, specified as "default", "bluered", a colormap name, or a three-column matrix of RGB triplets.

    • A value of "default" sets the colormap to the default colormap for the target axes ax, and a value of "bluered" sets the colormap to a color scale that ranges from blue to red.

    • A colormap name specifies a predefined colormap, and a three-column matrix of RGB triplets specifies a custom colormap. For more information on the available colormaps and the creation of a matrix of RGP triplets, see map.

    This argument is valid only when the variable predictor is not a categorical predictor, and the color predictor variable ColorPredictor is specified.

    Example: ColorMap="parula"

    Example: ColorMap="bluered"

    Data Types: single | double | char | string

    Output Arguments

    collapse all

    Dependence plot, returned as a BoxChart or Scatter object.

    • If predictor specifies a categorical predictor, then p is a BoxChart object. For more information, see BoxChart Properties.

    • If predictor specifies a noncategorical predictor, then p is a Scatter object. For more information, see Scatter Properties.

    More About

    collapse all

    Shapley Values

    In game theory, the Shapley value of a player is the average marginal contribution of the player in a cooperative game. In the context of machine learning prediction, the Shapley value of a feature for a query point explains the contribution of the feature to a prediction (response for regression or score of each class for classification) at the specified query point.

    The Shapley value of a feature for a query point is the contribution of the feature to the deviation from the average prediction. For a query point, the sum of the Shapley values for all features corresponds to the total deviation of the prediction from the average. That is, the sum of the average prediction and the Shapley values for all features corresponds to the prediction for the query point.

    For more details, see Shapley Values for Machine Learning Model.

    Tips

    • Use plotDependence when explainer contains Shapley values for many query points.

    Algorithms

    collapse all

    Color Assignment for Color Predictor Values

    plotDependence maps color predictor values (ColorPredictor) to the colormap (ColorMap) as follows:

    • If the color predictor is numeric, the function maps the minimum and maximum values to the appropriate colormap endpoints, and maps the remaining values to the interior of the colormap range.

    • If the color predictor is nonnumeric, the function maps categories to discrete colors in the colormap.

    Version History

    Introduced in R2024b