boxchart
Description
boxchart(
creates a box chart, or
box plot, for each predictor in explainer
)explainer.BlackboxModel.PredictorNames
,
where explainer
is a shapley
object. For
each predictor, the function displays the Shapley values for the query points in
explainer.QueryPoints
. The corresponding box plot displays the
following: the median, the lower and upper quartiles, any outliers (computed using the
interquartile range), and the minimum and maximum values that are not outliers.
If explainer.BlackboxModel
is a classification model, the function
displays box plots for class explainer.BlackboxModel.ClassNames(1)
by
default.
boxchart(
specifies additional options using one or more name-value arguments. For example, specify
explainer
,Name=Value
)NumImportantPredictors=5
to create box plots for the five features with
the greatest mean absolute Shapley values
(explainer.MeanAbsoluteShapley
).
boxchart(
displays the box
plots in the target axes ax
,___)ax
. Specify ax
as the
first argument in any of the previous syntaxes.
returns a
b
= boxchart(___)BoxChart
object using any of the input argument combinations in the
previous syntaxes. Use b
to query or modify the properties (BoxChart Properties) of the object after you create it.
Examples
Shapley Value Box Plots for Regression Model
Train a regression model and create a shapley
object. Use the fit
object function to compute the Shapley values for the specified query points. Then visualize the Shapley values for multiple query points by using the boxchart
object function.
Load the carbig
data set, which contains measurements of cars made in the 1970s and early 1980s.
load carbig
Create a table containing the predictor variables Acceleration
, Cylinders
, and so on, as well as the response variable MPG
.
tbl = table(Acceleration,Cylinders,Displacement, ...
Horsepower,Model_Year,Weight,MPG);
Removing missing values in a training set helps to reduce memory consumption and speed up training for the fitrkernel
function. Remove missing values in tbl
.
tbl = rmmissing(tbl);
Train a blackbox model of MPG
by using the fitrkernel
function. Specify the Cylinders
and Model_Year
variables as categorical predictors. Standardize the remaining predictors.
rng("default") % For reproducibility mdl = fitrkernel(tbl,"MPG",CategoricalPredictors=[2 5], ... Standardize=true);
Create a shapley
object. Because mdl
does not contain training data, specify the data set tbl
.
explainer = shapley(mdl,tbl)
explainer = BlackboxModel: [1×1 RegressionKernel] QueryPoints: [] BlackboxFitted: [] Shapley: [] X: [392×7 table] CategoricalPredictors: [2 5] Method: "interventional-kernel" Intercept: 23.2474 NumSubsets: 64
explainer
stores the training data tbl
in the X
property. By default, shapley
subsamples 100 observations from the data in X
and stores their indices in the SampledObservationIndices
property.
Compute the Shapley values for all observations in tbl
. To speed up computations, the fit
object functions uses the sampled observations rather than all of X
to compute the Shapley values. Further reduce computational time by using the UseParallel
name-value argument, if you have a Parallel Computing Toolbox™ license.
explainer = fit(explainer,tbl,UseParallel=true)
explainer = shapley explainer with the following mean absolute Shapley values: Predictor Value ______________ _______ "Acceleration" 0.5678 "Cylinders" 0.96799 "Displacement" 0.79668 "Horsepower" 0.78681 "Model_Year" 0.86258 "Weight" 0.987 Properties, Methods
For a regression model, fit
computes Shapley values using the predicted response, and stores them in the Shapley
property of the shapley
object. Because explainer
contains Shapley values for multiple query points, the function displays the mean absolute Shapley values by default.
Visualize the distribution of the Shapley values by using the boxchart
object function.
boxchart(explainer)
For each predictor, the function displays a box plot of the Shapley values for the query points. The function determines the order of the predictors by using the mean absolute Shapley values.
The box plot for the Weight
predictor indicates that the Shapley values are distributed symmetrically about the median. The minimum is slightly less than –2, the 25th percentile is approximately –1, the median is approximately 0, the 75th percentile is approximately 1, and the maximum is approximately 2.
Use a data tip to view the Shapley value metrics for the Weight
predictor.
b = boxchart(explainer);
datatip(b,"DataIndex",6);
Shapley Value Box Plots for Classification Model
Train a classification model and create a shapley
object. Then visualize the Shapley values for multiple query points by using the boxchart
object function.
Load the CreditRating_Historical
data set. The data set contains customer IDs and their financial ratios, industry labels, and credit ratings.
tbl = readtable("CreditRating_Historical.dat");
Display the first three rows of the table.
head(tbl,3)
ID WC_TA RE_TA EBIT_TA MVE_BVTD S_TA Industry Rating _____ _____ _____ _______ ________ _____ ________ ______ 62394 0.013 0.104 0.036 0.447 0.142 3 {'BB'} 48608 0.232 0.335 0.062 1.969 0.281 8 {'A' } 42444 0.311 0.367 0.074 1.935 0.366 1 {'A' }
Train a blackbox model of credit ratings by using the fitcecoc
function. Use the variables from the second through seventh columns in tbl
as the predictor variables. A recommended practice is to specify the class names to set the order of the classes.
blackbox = fitcecoc(tbl,"Rating", ... PredictorNames=tbl.Properties.VariableNames(2:7), ... CategoricalPredictors="Industry", ... ClassNames={'AAA','AA','A','BBB','BB','B','CCC'});
Create a shapley
object that explains the predictions for multiple query points. For faster computation, shapley
subsamples 100 observations from the predictor data in blackbox
to compute the Shapley values. Specify the sampled observations as the query points in the call to the fit
object function.
rng("default") % For reproducibility explainer = shapley(blackbox); queryPoints = explainer.X(explainer.SampledObservationIndices,:); explainer = fit(explainer,queryPoints);
For a classification model, fit
computes Shapley values using the predicted class scores, and stores them in the Shapley
property of the shapley
object. Because explainer
contains Shapley values for multiple query points, display the mean absolute Shapley values instead.
explainer.MeanAbsoluteShapley
ans=6×8 table
Predictor AAA AA A BBB BB B CCC
__________ _________ __________ _________ _________ _________ _________ _________
"WC_TA" 0.055977 0.034453 0.027338 0.023902 0.036098 0.054763 0.054931
"RE_TA" 0.12468 0.10314 0.10787 0.087013 0.090298 0.17123 0.2552
"EBIT_TA" 0.0015598 0.00095166 0.0011936 0.0010499 0.0010047 0.0018817 0.0017712
"MVE_BVTD" 0.84966 0.68785 0.66198 0.94501 1.3672 1.5715 1.2161
"S_TA" 0.025009 0.0095605 0.010606 0.014469 0.0017235 0.0075275 0.012529
"Industry" 0.076169 0.085926 0.063854 0.046528 0.053801 0.11261 0.11829
For each predictor and class, the mean absolute Shapley value is the absolute value of the Shapley values, averaged across all query points. For class AA
, the MVE_BVTD
predictor has a noticeably greater mean absolute Shapley value than the other predictors.
Visualize the distribution of the Shapley values for class AA
by using the boxchart
object function.
boxchart(explainer,ClassName={'AA'})
For each predictor, the function displays a box plot of the Shapley values for the query points. The function determines the order of the predictors by using the mean absolute Shapley values.
For class AA
, some of the Shapley values for the RE_TA
predictor are outliers. This result suggests that, for a few query points, the predictor greatly affects the class AA
predicted score.
Input Arguments
explainer
— Object explaining blackbox model
shapley
object
Object explaining the blackbox model, specified as a shapley
object. explainer
must contain Shapley values; that is,
explainer.Shapley
must be nonempty.
ax
— Axes for plot
Axes
object
Axes for the plot, specified as an Axes
object. If you do not specify ax
, then boxchart
creates the plot using the current axes. For more information on creating an Axes
object, see axes
.
Name-Value Arguments
Specify optional pairs of arguments as
Name1=Value1,...,NameN=ValueN
, where Name
is
the argument name and Value
is the corresponding value.
Name-value arguments must appear after other arguments, but the order of the
pairs does not matter.
Example: boxchart(explainer,NumImportantPredictors=5,JitterOutliers="on")
creates a box plot for each of the five predictors with the greatest mean absolute Shapley
values, and jitters the outliers in the box plots.
NumImportantPredictors
— Number of important predictors
min(M,10)
where M
is the number of predictors (default) | positive integer
Number of important predictors to plot, specified as a positive integer. The
boxchart
function plots the Shapley values of the specified
number of predictors with the greatest mean absolute Shapley values.
Example: NumImportantPredictors=5
specifies to plot the five most important predictors. The boxchart
function determines the order of importance by using the mean absolute Shapley values.
Data Types: single
| double
ClassName
— Class label to plot
explainer.BlackboxModel.ClassNames(1)
(default) | numeric scalar | logical scalar | character vector | string scalar | categorical scalar
Class label to plot, specified as a numeric scalar, logical scalar, character vector, string
scalar, or categorical scalar. The value and data type of ClassName
must match one of the class names in the ClassNames
property of the
machine learning model in explainer
(explainer.BlackboxModel.ClassNames
). The software accepts
character vectors, string scalars, and categorical scalars interchangeably.
This argument is valid only when the machine learning model (BlackboxModel
) in explainer
is a classification model.
Example: ClassName="AAA"
Data Types: single
| double
| logical
| char
| string
| categorical
JitterOutliers
— Outlier marker displacement
"off"
(default) | "on"
| true
or 1
| false
or 0
Outlier marker displacement, specified as "on"
or
"off"
, or as numeric or logical 1
(true
) or 0
(false
). A
value of "on"
is equivalent to true
, and
"off"
is equivalent to false
. Therefore, you
can use the value of this property as a logical value. The value is stored as an
on/off logical value of type matlab.lang.OnOffSwitchState
.
If you specify the JitterOutliers
value as
"on"
, then boxchart
randomly displaces the
outlier markers along the vertical direction to help you distinguish between outliers
that have similar Shapley values.
Example: JitterOutliers="on"
Data Types: single
| double
| logical
| char
| string
More About
Shapley Values
In game theory, the Shapley value of a player is the average marginal contribution of the player in a cooperative game. In the context of machine learning prediction, the Shapley value of a feature for a query point explains the contribution of the feature to a prediction (response for regression or score of each class for classification) at the specified query point.
The Shapley value of a feature for a query point is the contribution of the feature to the deviation from the average prediction. For a query point, the sum of the Shapley values for all features corresponds to the total deviation of the prediction from the average. That is, the sum of the average prediction and the Shapley values for all features corresponds to the prediction for the query point.
For more details, see Shapley Values for Machine Learning Model.
Tips
Use
boxchart
whenexplainer
contains Shapley values for many query points.
Version History
Introduced in R2024a
MATLAB Command
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)
Asia Pacific
- Australia (English)
- India (English)
- New Zealand (English)
- 中国
- 日本Japanese (日本語)
- 한국Korean (한국어)