counterfactuals
Syntax
Description
uses the binary classification model counterExamples = counterfactuals(Mdl,observation)Mdl to return counterfactual
examples for observation. Counterfactual
examples are observations that have minimally different predictor values from
the values of a specified observation, but have different predicted class labels. For more
information, see Counterfactual Examples.
The counterfactuals function uses Bayesian optimization to find and
evaluate counterfactual candidates. The software
displays the optimization results one iteration at a
time. For more information, see Bayesian Optimization.
specifies the training data counterExamples = counterfactuals(Mdl,X,observation)X used to train the compact model
Mdl. You must specify X when the model is
compact; that is, the model does not contain the training data in its properties.
specifies options using one or more name-value
arguments in addition to any of the input argument
combinations in previous syntaxes. For example, you
can specify the number of counterfactual examples to
return and the maximum number of predictors that can
be changed by using the
counterExamples = counterfactuals(___,Name=Value)NumCounterfactualExamples and
MaxNumModifiablePredictors
name-value arguments, respectively.
[
also returns counterExamples,metrics] = counterfactuals(___)metrics, which contains additional information about the
counterfactual examples.
Examples
Train a binary neural network classifier to predict whether a corporate customer has a "good" or "poor" credit rating. For a customer who is predicted to have a poor rating, find counterfactual examples by using the counterfactuals function. That is, determine a minimal set of changes to the customer profile that leads to a predicted credit rating of "good."
Read the sample file CreditRating_Historical.dat into a table. The predictor data consists of financial ratios and industry sector information for a list of corporate customers. The response variable consists of credit ratings assigned by a rating agency. Preview the first few rows of the data set.
creditrating = readtable("CreditRating_Historical.dat");
head(creditrating) ID WC_TA RE_TA EBIT_TA MVE_BVTD S_TA Industry Rating
_____ ______ ______ _______ ________ _____ ________ _______
62394 0.013 0.104 0.036 0.447 0.142 3 {'BB' }
48608 0.232 0.335 0.062 1.969 0.281 8 {'A' }
42444 0.311 0.367 0.074 1.935 0.366 1 {'A' }
48631 0.194 0.263 0.062 1.017 0.228 4 {'BBB'}
43768 0.121 0.413 0.057 3.647 0.466 12 {'AAA'}
39255 -0.117 -0.799 0.01 0.179 0.082 4 {'CCC'}
62236 0.087 0.158 0.049 0.816 0.324 2 {'BBB'}
39354 0.005 0.181 0.034 2.597 0.388 7 {'AA' }
In the Rating response variable, combine the AAA, AA, A, and BBB ratings into a category of "good" ratings, and the BB, B, and CCC ratings into a category of "poor" ratings. Also, convert the Industry variable to a categorical variable.
Rating = categorical(creditrating.Rating); Rating = mergecats(Rating,["AAA","AA","A","BBB"],"good"); Rating = mergecats(Rating,["BB","B","CCC"],"poor"); creditrating.Rating = Rating; creditrating.Industry = categorical(creditrating.Industry);
Train a binary neural network classifier using the creditrating data. Specify the Rating column of creditrating as the response, and the WC_TA, RE_TA, EBIT_TA, MVE_BVTD, and S_TA columns as predictors. Standardize the predictors.
predictors = ["WC_TA","RE_TA","EBIT_TA","MVE_BVTD","S_TA"]; rng(0,"twister") % For reproducibility Mdl = fitcnet(creditrating,"Rating",PredictorNames=predictors, ... Standardize=true)
Mdl =
ClassificationNeuralNetwork
PredictorNames: {'WC_TA' 'RE_TA' 'EBIT_TA' 'MVE_BVTD' 'S_TA'}
ResponseName: 'Rating'
CategoricalPredictors: []
ClassNames: [good poor]
ScoreTransform: 'none'
NumObservations: 3932
LayerSizes: 10
Activations: 'relu'
OutputLayerActivation: 'softmax'
Solver: 'LBFGS'
ConvergenceInfo: [1×1 struct]
TrainingHistory: [1000×7 table]
Properties, Methods
Mdl is a ClassificationNeuralNetwork object. You can use the model object along with its predict function to predict whether a customer has a good credit rating.
Predict the credit rating label for the observations in creditrating. Display the first eight predictions.
predictions = predict(Mdl,creditrating); head(predictions)
poor
good
good
good
good
poor
good
good
Note that the first observation has a predicted credit rating of poor.
Display the first observation in creditrating.
observation = creditrating(1,:)
observation=1×8 table
ID WC_TA RE_TA EBIT_TA MVE_BVTD S_TA Industry Rating
_____ _____ _____ _______ ________ _____ ________ ______
62394 0.013 0.104 0.036 0.447 0.142 3 poor
Generate counterfactual examples for the observation (corporate customer with ID 62394). That is, find a minimal set of changes to the predictor values that result in a predicted credit rating of good. By default, the counterfactuals function tries to find 10 counterfactual examples using a Bayesian optimization routine with 50 iterations. The optimization process can take some time.
counterExamples = counterfactuals(Mdl,observation)
|============================================================================================================================================================================================================================================|
| Iter | Eval | Objective | Objective | BestSoFar | BestSoFar | Constraint1 | WC_TA | RE_TA | EBIT_TA | MVE_BVTD | S_TA | WC_TA_Indica-| RE_TA_Indica-| EBIT_TA_Indi-| MVE_BVTD_Ind-| S_TA_Indicat-|
| | result | | runtime | (observed) | (estim.) | violation | | | | | | tor | tor | cator | icator | or |
|============================================================================================================================================================================================================================================|
| 1 | Best | 21.441 | 0.23306 | 21.441 | 21.441 | -0.5 | -0.51929 | 0.104 | -0.57862 | 34.888 | 0.142 | 1 | 0 | 1 | 1 | 0 |
| 2 | Best | 2.0615 | 0.15413 | 2.0615 | 2.0615 | -0.5 | 0.013 | 0.104 | 0.10111 | 0.447 | 0.142 | 0 | 0 | 1 | 0 | 0 |
| 3 | Accept | 4.4134 | 0.033523 | 2.0615 | 2.0615 | -0.5 | 0.013 | 0.104 | 0.036 | 18.334 | 0.142 | 0 | 0 | 0 | 1 | 0 |
| 4 | Accept | 20.642 | 0.056542 | 2.0615 | 2.0615 | -0.5 | 0.013 | 0.104 | 0.036 | 83.407 | 0.7995 | 0 | 0 | 0 | 1 | 1 |
| 5 | Accept | 10.007 | 0.10113 | 2.0615 | 2.0615 | -0.5 | 0.013 | 0.104 | 0.20998 | 34.311 | 0.142 | 0 | 0 | 1 | 1 | 0 |
| 6 | Infeas | 0.018251 | 0.01314 | 2.0615 | 2.0615 | 1 | 0.013 | 0.11025 | 0.036 | 0.447 | 0.142 | 0 | 1 | 0 | 0 | 0 |
| 7 | Accept | 10.214 | 0.020598 | 2.0615 | 2.0615 | -0.5 | 0.013 | 0.104 | -0.28662 | 0.447 | 0.142 | 0 | 0 | 1 | 0 | 0 |
| 8 | Infeas | 2.2064 | 0.010517 | 2.0615 | 2.0615 | 1 | 0.013 | 0.104 | -0.033691 | 0.447 | 0.142 | 0 | 0 | 1 | 0 | 0 |
| 9 | Infeas | 0.08673 | 0.010805 | 2.0615 | 2.0615 | 1 | 0.013 | 0.13124 | 0.036 | 0.30677 | 0.142 | 0 | 1 | 0 | 1 | 0 |
| 10 | Accept | 2.344 | 0.031371 | 2.0615 | 2.0615 | -0.5 | 0.013 | 0.88171 | 0.054392 | 0.447 | 0.142 | 0 | 1 | 1 | 0 | 0 |
| 11 | Infeas | 5.2237 | 0.033393 | 2.0615 | 2.0615 | 1 | 0.013 | -1.677 | 0.051793 | 0.447 | 0.142 | 0 | 1 | 1 | 0 | 0 |
| 12 | Accept | 29.134 | 0.010037 | 2.0615 | 2.0615 | -0.5 | 0.013 | 0.104 | 0.036 | 118.53 | 0.142 | 0 | 0 | 0 | 1 | 0 |
| 13 | Accept | 27.914 | 0.016968 | 2.0615 | 2.0615 | -0.5 | 0.013 | 0.104 | 0.036 | 0.447 | 7.0177 | 0 | 0 | 0 | 0 | 1 |
| 14 | Accept | 3.0173 | 0.012675 | 2.0615 | 2.0615 | -0.5 | 0.013 | 0.104 | 0.036 | 0.447 | 0.88522 | 0 | 0 | 0 | 0 | 1 |
| 15 | Accept | 2.3086 | 0.010071 | 2.0615 | 2.0615 | -0.5 | -0.39127 | 0.104 | 0.036 | 0.447 | 0.032068 | 1 | 0 | 0 | 0 | 1 |
| 16 | Infeas | 0.37713 | 0.018759 | 2.0615 | 2.0615 | 1 | 0.013 | -0.020652 | 0.039125 | 0.447 | 0.142 | 0 | 1 | 1 | 0 | 0 |
| 17 | Infeas | 0.92027 | 0.016985 | 2.0615 | 2.0615 | 1 | 0.013 | -0.035013 | 0.062089 | 0.447 | 0.142 | 0 | 1 | 1 | 0 | 0 |
| 18 | Accept | 3.3613 | 0.009911 | 2.0615 | 2.0615 | -0.5 | 0.61293 | 0.104 | 0.036 | 0.447 | 0.142 | 1 | 0 | 0 | 0 | 0 |
| 19 | Infeas | 0.39162 | 0.011278 | 2.0615 | 2.0615 | 1 | 0.013 | 0.104 | 0.036 | 0.447 | 0.045536 | 0 | 0 | 0 | 0 | 1 |
| 20 | Infeas | 0.68187 | 0.010793 | 2.0615 | 2.0615 | 1 | 0.013 | -0.037075 | 0.036 | 0.447 | 0.27586 | 0 | 1 | 0 | 0 | 1 |
|============================================================================================================================================================================================================================================|
| Iter | Eval | Objective | Objective | BestSoFar | BestSoFar | Constraint1 | WC_TA | RE_TA | EBIT_TA | MVE_BVTD | S_TA | WC_TA_Indica-| RE_TA_Indica-| EBIT_TA_Indi-| MVE_BVTD_Ind-| S_TA_Indicat-|
| | result | | runtime | (observed) | (estim.) | violation | | | | | | tor | tor | cator | icator | or |
|============================================================================================================================================================================================================================================|
| 21 | Accept | 12.653 | 0.021044 | 2.0615 | 2.0615 | -0.5 | -2.2453 | 0.104 | 0.036 | 0.447 | 0.142 | 1 | 0 | 0 | 0 | 0 |
| 22 | Best | 0.5968 | 0.013326 | 0.5968 | 0.5968 | -0.5 | 0.013 | 0.104 | 0.036 | 2.8658 | 0.142 | 0 | 0 | 0 | 1 | 0 |
| 23 | Infeas | 11.727 | 0.021362 | 0.5968 | 0.5968 | 1 | 0.013 | -3.1109 | -0.18608 | 0.447 | 0.142 | 0 | 1 | 1 | 0 | 0 |
| 24 | Infeas | 16.391 | 0.019472 | 0.5968 | 0.5968 | 1 | -2.2365 | -1.1717 | -0.27339 | 0.447 | 0.142 | 1 | 1 | 1 | 0 | 0 |
| 25 | Accept | 8.8045 | 0.021765 | 0.5968 | 0.5968 | -0.5 | -1.231 | 0.104 | 0.20592 | 0.447 | 0.142 | 1 | 0 | 1 | 0 | 0 |
| 26 | Accept | 7.1671 | 0.011044 | 0.5968 | 0.5968 | -0.5 | 0.013 | 1.6927 | 0.20859 | 0.447 | 0.142 | 0 | 1 | 1 | 0 | 0 |
| 27 | Infeas | 21.179 | 0.010482 | 0.5968 | 0.59679 | 1 | 0.83032 | -2.3077 | -0.3518 | 0.447 | 3.8554 | 1 | 1 | 1 | 0 | 1 |
| 28 | Infeas | 0 | 0.020523 | 0.5968 | 0.59679 | 1 | 0.013 | 0.104 | 0.036 | 0.447 | 0.142 | 0 | 0 | 0 | 0 | 0 |
| 29 | Infeas | 1.7044 | 0.029304 | 0.5968 | 0.5968 | 1 | 0.013 | -0.4798 | 0.036 | 0.447 | 0.142 | 0 | 1 | 0 | 0 | 0 |
| 30 | Accept | 10.901 | 0.012246 | 0.5968 | 0.5968 | -0.5 | 0.013 | -3.1503 | 0.20483 | 0.447 | 0.142 | 0 | 1 | 1 | 0 | 0 |
| 31 | Accept | 12.871 | 0.023393 | 0.5968 | 0.5968 | -0.5 | -1.1864 | -3.0852 | 0.0434 | 0.447 | 1.573 | 1 | 1 | 1 | 0 | 1 |
| 32 | Infeas | 0.08542 | 0.012217 | 0.5968 | 0.59679 | 1 | 0.013 | 0.104 | 0.036 | 0.10079 | 0.142 | 0 | 0 | 0 | 1 | 0 |
| 33 | Best | 0.44956 | 0.015796 | 0.44956 | 0.4496 | -0.5 | 0.013 | 0.104 | 0.036 | 2.2691 | 0.142 | 0 | 0 | 0 | 1 | 0 |
| 34 | Best | 0.31122 | 0.020817 | 0.31122 | 0.31132 | -0.493 | 0.013 | 0.104 | 0.036 | 1.7084 | 0.142 | 0 | 0 | 0 | 1 | 0 |
| 35 | Accept | 5.283 | 0.015832 | 0.31122 | 0.31133 | -0.5 | 0.013 | 0.104 | 0.16389 | 14.201 | 0.142 | 0 | 0 | 1 | 1 | 0 |
| 36 | Accept | 0.93106 | 0.010628 | 0.31122 | 0.3113 | -0.5 | 0.013 | 0.40825 | 0.036 | 1.578 | 0.142 | 0 | 1 | 0 | 1 | 0 |
| 37 | Accept | 3.7367 | 0.012637 | 0.31122 | 0.31132 | -0.5 | 0.013 | -0.76256 | 0.044279 | 11.542 | 0.142 | 0 | 1 | 1 | 1 | 0 |
| 38 | Best | 0.22988 | 0.011384 | 0.22988 | 0.22999 | -0.453 | 0.013 | 0.104 | 0.036 | 1.3787 | 0.142 | 0 | 0 | 0 | 1 | 0 |
| 39 | Infeas | 0.15388 | 0.015089 | 0.22988 | 0.22995 | 1 | 0.013 | 0.104 | 0.03114 | 0.447 | 0.142 | 0 | 0 | 1 | 0 | 0 |
| 40 | Accept | 7.5969 | 0.014177 | 0.22988 | 0.22995 | -0.5 | -1.1742 | 0.104 | 0.036 | 15.323 | 0.142 | 1 | 0 | 0 | 1 | 0 |
|============================================================================================================================================================================================================================================|
| Iter | Eval | Objective | Objective | BestSoFar | BestSoFar | Constraint1 | WC_TA | RE_TA | EBIT_TA | MVE_BVTD | S_TA | WC_TA_Indica-| RE_TA_Indica-| EBIT_TA_Indi-| MVE_BVTD_Ind-| S_TA_Indicat-|
| | result | | runtime | (observed) | (estim.) | violation | | | | | | tor | tor | cator | icator | or |
|============================================================================================================================================================================================================================================|
| 41 | Best | 0.20465 | 0.017495 | 0.20465 | 0.20472 | -0.412 | 0.013 | 0.10161 | 0.036 | 1.276 | 0.142 | 0 | 1 | 0 | 1 | 0 |
| 42 | Accept | 7.1041 | 0.012196 | 0.20465 | 0.20472 | -0.5 | 0.013 | 0.104 | 0.036 | 14.647 | 1.6643 | 0 | 0 | 0 | 1 | 1 |
| 43 | Accept | 5.7281 | 0.01167 | 0.20465 | 0.20472 | -0.5 | 0.81098 | 0.104 | 0.036 | 14.96 | 0.142 | 1 | 0 | 0 | 1 | 0 |
| 44 | Best | 0.13973 | 0.010735 | 0.13973 | 0.13993 | -0.183 | 0.013 | 0.104 | 0.036 | 1.0133 | 0.142 | 0 | 0 | 0 | 1 | 0 |
| 45 | Accept | 7.4715 | 0.012216 | 0.13973 | 0.13993 | -0.5 | 0.013 | -1.4729 | 0.099694 | 22.854 | 0.142 | 0 | 1 | 1 | 1 | 0 |
| 46 | Accept | 7.2471 | 0.016786 | 0.13973 | 0.13993 | -0.5 | 0.013 | 1.0573 | 0.14325 | 23.816 | 0.142 | 0 | 1 | 1 | 1 | 0 |
| 47 | Accept | 2.291 | 0.013249 | 0.13973 | 0.13993 | -0.5 | 0.013 | -0.55986 | 0.036 | 5.3982 | 0.142 | 0 | 1 | 0 | 1 | 0 |
| 48 | Best | 0.11813 | 0.013187 | 0.11813 | 0.11897 | -0.0575 | 0.013 | 0.104 | 0.036 | 0.92578 | 0.142 | 0 | 0 | 0 | 1 | 0 |
| 49 | Best | 0.11143 | 0.014099 | 0.11143 | 0.11214 | -0.0162 | 0.013 | 0.104 | 0.036 | 0.89863 | 0.142 | 0 | 0 | 0 | 1 | 0 |
| 50 | Accept | 3.6124 | 0.009186 | 0.11143 | 0.11211 | -0.5 | 0.013 | 0.104 | -0.052742 | 9.6502 | 0.142 | 0 | 0 | 1 | 1 | 0 |
__________________________________________________________
Optimization completed.
MaxObjectiveEvaluations of 50 reached.
Total function evaluations: 50
Total elapsed time: 59.8765 seconds
Total objective function evaluation time: 1.279
Best observed feasible point:
WC_TA RE_TA EBIT_TA MVE_BVTD S_TA WC_TA_Indicator RE_TA_Indicator EBIT_TA_Indicator MVE_BVTD_Indicator S_TA_Indicator
_____ _____ _______ ________ _____ _______________ _______________ _________________ __________________ ______________
0.013 0.104 0.036 0.89863 0.142 0 0 0 1 0
Observed objective function value = 0.11143
Estimated objective function value = 0.11211
Function evaluation time = 0.014099
Observed constraint violations =[ -0.016208 ]
Best estimated feasible point (according to models):
WC_TA RE_TA EBIT_TA MVE_BVTD S_TA WC_TA_Indicator RE_TA_Indicator EBIT_TA_Indicator MVE_BVTD_Indicator S_TA_Indicator
_____ _____ _______ ________ _____ _______________ _______________ _________________ __________________ ______________
0.013 0.104 0.036 0.89863 0.142 0 0 0 1 0
Estimated objective function value = 0.11211
Estimated function evaluation time = 0.013677
Estimated constraint violations =[ -0.016206 ]

counterExamples=10×5 table
WC_TA RE_TA EBIT_TA MVE_BVTD S_TA
_____ _______ _______ ________ _____
0.013 0.104 0.036 0.89863 0.142
0.013 0.104 0.036 0.92578 0.142
0.013 0.104 0.036 1.0133 0.142
0.013 0.10161 0.036 1.276 0.142
0.013 0.104 0.036 1.3787 0.142
0.013 0.104 0.036 1.7084 0.142
0.013 0.104 0.036 2.2691 0.142
0.013 0.104 0.036 2.8658 0.142
0.013 0.40825 0.036 1.578 0.142
0.013 0.104 0.10111 0.447 0.142
The counterExamples table contains the predictor data for the counterfactual examples, listed in order of proximity to observation. If the function does not find enough feasible points during the optimization process, the number of counterfactual examples can be less than the default.
Compare the predictor values of observation and the first counterfactual example. Visualize the values (without jitter) using a parallel coordinates plot.
predObservation = observation(:,predictors)
predObservation=1×5 table
WC_TA RE_TA EBIT_TA MVE_BVTD S_TA
_____ _____ _______ ________ _____
0.013 0.104 0.036 0.447 0.142
predCounterEx = counterExamples(1,:)
predCounterEx=1×5 table
WC_TA RE_TA EBIT_TA MVE_BVTD S_TA
_____ _____ _______ ________ _____
0.013 0.104 0.036 0.89863 0.142
parallelplot([predObservation;predCounterEx],Jitter=0)
ylabel("Predictor Value")
The counterfactual example has a greater MVE_BVTD value than observation. This change in predictor value leads to a different credit rating prediction.
observationLabel = predict(Mdl,observation)
observationLabel = categorical
poor
counterExLabel = predict(Mdl,predCounterEx)
counterExLabel = categorical
good
Train a binary neural network classifier to predict whether an individual makes over $50,000 per year. For an individual who is predicted to have a yearly salary of $50,000 or less, find counterfactual examples by using the counterfactuals function. That is, determine a minimal set of changes to the individual's profile that leads to a predicted yearly salary of over $50,000. Visually compare the changes by using a glyph plot.
Load the 1994 census data stored in census1994.mat. The data set consists of demographic information from the US Census Bureau, split into a training data set adultdata and a test data set adulttest. Preview the first few rows of the training data set.
load census1994
head(adultdata) age workClass fnlwgt education education_num marital_status occupation relationship race sex capital_gain capital_loss hours_per_week native_country salary
___ ________________ __________ _________ _____________ _____________________ _________________ _____________ _____ ______ ____________ ____________ ______________ ______________ ______
39 State-gov 77516 Bachelors 13 Never-married Adm-clerical Not-in-family White Male 2174 0 40 United-States <=50K
50 Self-emp-not-inc 83311 Bachelors 13 Married-civ-spouse Exec-managerial Husband White Male 0 0 13 United-States <=50K
38 Private 2.1565e+05 HS-grad 9 Divorced Handlers-cleaners Not-in-family White Male 0 0 40 United-States <=50K
53 Private 2.3472e+05 11th 7 Married-civ-spouse Handlers-cleaners Husband Black Male 0 0 40 United-States <=50K
28 Private 3.3841e+05 Bachelors 13 Married-civ-spouse Prof-specialty Wife Black Female 0 0 40 Cuba <=50K
37 Private 2.8458e+05 Masters 14 Married-civ-spouse Exec-managerial Wife White Female 0 0 40 United-States <=50K
49 Private 1.6019e+05 9th 5 Married-spouse-absent Other-service Not-in-family Black Female 0 0 16 Jamaica <=50K
52 Self-emp-not-inc 2.0964e+05 HS-grad 9 Married-civ-spouse Exec-managerial Husband White Male 0 0 45 United-States >50K
Each row contains the demographic information for one adult. The last column, salary, shows whether a person has a salary less than or equal to $50,000 per year or greater than $50,000 per year.
Delete the rows of adultdata in which the table has missing values.
adultdata = rmmissing(adultdata);
Because education and education_num contain similar information, remove the nonnumeric education variable from adultdata.
adultdata.education = [];
Train a binary neural network classifier using adultdata. Specify the salary column of adultdata as the response and the fnlwgt column as the observation weights.Standardize the numeric predictors.
rng(0,"twister") Mdl = fitcnet(adultdata,"salary","Weights","fnlwgt", ... Standardize=true)
Mdl =
ClassificationNeuralNetwork
PredictorNames: {'age' 'workClass' 'education_num' 'marital_status' 'occupation' 'relationship' 'race' 'sex' 'capital_gain' 'capital_loss' 'hours_per_week' 'native_country'}
ResponseName: 'salary'
CategoricalPredictors: [2 4 5 6 7 8 12]
ClassNames: [<=50K >50K]
ScoreTransform: 'none'
NumObservations: 30162
LayerSizes: 10
Activations: 'relu'
OutputLayerActivation: 'softmax'
Solver: 'LBFGS'
ConvergenceInfo: [1×1 struct]
TrainingHistory: [1000×7 table]
Properties, Methods
Mdl is a ClassificationNeuralNetwork object. You can use the model object along with its predict function to predict whether an individual makes over $50,000 per year.
Display the first observation in adultdata. Determine the predicted salary for the individual whose demographic information is in observation.
observation = adultdata(1,:)
observation=1×14 table
age workClass fnlwgt education_num marital_status occupation relationship race sex capital_gain capital_loss hours_per_week native_country salary
___ _________ ______ _____________ ______________ ____________ _____________ _____ ____ ____________ ____________ ______________ ______________ ______
39 State-gov 77516 13 Never-married Adm-clerical Not-in-family White Male 2174 0 40 United-States <=50K
label = predict(Mdl,observation)
label = categorical
<=50K
The predicted salary for the individual is less than or equal to $50,000 per year.
Generate counterfactual examples for observation. That is, find a minimal set of changes to the predictor values of observation that result in a predicted salary of over $50,000 per year. Specify education_num, capital_gain, capital_loss, and hours_per_week as the only predictors whose values can be modified. By default, the counterfactuals function tries to find 10 counterfactual examples using a Bayesian optimization routine with 50 iterations. The optimization process can take some time.
modPredictors = ["education_num","capital_gain", ... "capital_loss","hours_per_week"]; counterExamples = counterfactuals(Mdl,observation, ... ModifiablePredictors=modPredictors)
|==============================================================================================================================================================================================================|
| Iter | Eval | Objective | Objective | BestSoFar | BestSoFar | Constraint1 | education_num| capital_gain | capital_loss | hours_per_we-| education_nu-| capital_gain-| capital_loss-| hours_per_we-|
| | result | | runtime | (observed) | (estim.) | violation | | | | ek | m_Indicator | _Indicator | _Indicator | ek_Indicator |
|==============================================================================================================================================================================================================|
| 1 | Infeas | 0 | 0.10028 | NaN | 0 | 1 | 13 | 2174 | 0 | 40 | 0 | 0 | 0 | 0 |
| 2 | Best | 6.6149 | 0.057966 | 6.6149 | 6.6149 | -0.5 | 11 | 47761 | 926.81 | 40 | 1 | 1 | 1 | 0 |
| 3 | Accept | 12.289 | 0.015952 | 6.6149 | 6.6149 | -0.5 | 13 | 88911 | 1505.5 | 40 | 0 | 1 | 1 | 0 |
| 4 | Infeas | 0.16695 | 0.016266 | 6.6149 | 6.6149 | 1 | 13 | 2174 | 0 | 38 | 0 | 0 | 0 | 1 |
| 5 | Infeas | 5.1104 | 0.027722 | 6.6149 | 6.6149 | 1 | 2 | 497.18 | 1051.1 | 50 | 1 | 1 | 1 | 1 |
| 6 | Infeas | 4.7059 | 0.006635 | 6.6149 | 6.6149 | 1 | 1 | 2174 | 0 | 40 | 1 | 0 | 0 | 0 |
| 7 | Infeas | 4.925 | 0.007349 | 6.6149 | 6.6149 | 1 | 13 | 2174 | 0 | 99 | 0 | 0 | 0 | 1 |
| 8 | Infeas | 6.0187 | 0.022052 | 6.6149 | 6.6149 | 1 | 10 | 598.95 | 1312.5 | 99 | 1 | 1 | 1 | 1 |
| 9 | Infeas | 4.943 | 0.005777 | 6.6149 | 6.6149 | 1 | 2 | 10.236 | 968.61 | 40 | 1 | 1 | 1 | 0 |
| 10 | Infeas | 10.246 | 0.006063 | 6.6149 | 6.6149 | 1 | 13 | 167.17 | 3958.5 | 4 | 1 | 1 | 1 | 1 |
| 11 | Infeas | 0.25984 | 0.005765 | 6.6149 | 6.6149 | 1 | 13 | 1484 | 98.068 | 40 | 1 | 1 | 1 | 0 |
| 12 | Best | 5.4807 | 0.006266 | 5.4807 | 5.4807 | -0.5 | 16 | 38414 | 0 | 66 | 1 | 1 | 0 | 1 |
| 13 | Infeas | 0.8783 | 0.005884 | 5.4807 | 5.4807 | 1 | 13 | 5543.7 | 0 | 49 | 0 | 1 | 0 | 1 |
| 14 | Best | 4.8239 | 0.006624 | 4.8239 | 4.8239 | -0.5 | 15 | 30473 | 0 | 6 | 1 | 1 | 0 | 1 |
| 15 | Best | 2.7702 | 0.006359 | 2.7702 | 2.7702 | -0.5 | 14 | 22484 | 0 | 40 | 1 | 1 | 0 | 0 |
| 16 | Best | 2.2114 | 0.00582 | 2.2114 | 2.2114 | -0.492 | 11 | 17487 | 0 | 40 | 1 | 1 | 0 | 0 |
| 17 | Best | 1.6072 | 0.005665 | 1.6072 | 1.6072 | -0.483 | 14 | 13717 | 0 | 40 | 1 | 1 | 0 | 0 |
| 18 | Infeas | 1.1885 | 0.006087 | 1.6072 | 1.6072 | 1 | 16 | 925.03 | 0 | 40 | 1 | 1 | 0 | 0 |
| 19 | Infeas | 0.060194 | 0.012045 | 1.6072 | 1.6072 | 1 | 13 | 2174 | 24.336 | 40 | 0 | 0 | 1 | 0 |
| 20 | Best | 1.3798 | 0.007339 | 1.3798 | 1.3798 | -0.408 | 14 | 10863 | 237.74 | 38 | 1 | 1 | 1 | 1 |
|==============================================================================================================================================================================================================|
| Iter | Eval | Objective | Objective | BestSoFar | BestSoFar | Constraint1 | education_num| capital_gain | capital_loss | hours_per_we-| education_nu-| capital_gain-| capital_loss-| hours_per_we-|
| | result | | runtime | (observed) | (estim.) | violation | | | | ek | m_Indicator | _Indicator | _Indicator | ek_Indicator |
|==============================================================================================================================================================================================================|
| 21 | Best | 0.95657 | 0.006261 | 0.95657 | 0.95657 | -0.0843 | 12 | 8635.9 | 0 | 40 | 1 | 1 | 0 | 0 |
| 22 | Best | 0.93429 | 0.005564 | 0.93429 | 0.93429 | -0.134 | 13 | 8394.9 | 165.4 | 40 | 1 | 1 | 1 | 0 |
| 23 | Best | 0.81141 | 0.005811 | 0.81141 | 0.81141 | -0.129 | 13 | 8183.5 | 0 | 40 | 1 | 1 | 0 | 0 |
| 24 | Best | 0.75636 | 0.005578 | 0.75636 | 0.75637 | -0.0654 | 13 | 7775.8 | 0 | 40 | 0 | 1 | 0 | 0 |
| 25 | Best | 0.72736 | 0.005687 | 0.72736 | 0.72749 | -0.0311 | 13 | 7561 | 0 | 40 | 0 | 1 | 0 | 0 |
| 26 | Infeas | 2.1275 | 0.005467 | 0.72736 | 0.7275 | 1 | 13 | 7476.4 | 0 | 16 | 0 | 1 | 0 | 1 |
| 27 | Accept | 7.6864 | 0.005939 | 0.72736 | 0.7275 | -0.5 | 3 | 35513 | 0 | 98 | 1 | 1 | 0 | 1 |
| 28 | Infeas | 0.46236 | 0.005688 | 0.72736 | 0.72717 | 1 | 13 | 5598.4 | 0 | 40 | 0 | 1 | 0 | 0 |
| 29 | Infeas | 1.6431 | 0.013693 | 0.72736 | 0.72732 | 1 | 13 | 7499.9 | 597.3 | 40 | 0 | 1 | 1 | 0 |
| 30 | Infeas | 0.25475 | 0.005695 | 0.72736 | 0.72727 | 1 | 13 | 287.28 | 0 | 40 | 0 | 1 | 0 | 1 |
| 31 | Best | 0.71077 | 0.006322 | 0.71077 | 0.71068 | -0.0113 | 13 | 7438.1 | 0 | 40 | 0 | 1 | 0 | 0 |
| 32 | Accept | 7.2724 | 0.012628 | 0.71077 | 0.71067 | -0.485 | 1 | 35854 | 0 | 2 | 1 | 1 | 0 | 1 |
| 33 | Infeas | 0.69999 | 0.005374 | 0.71077 | 0.71074 | 1 | 13 | 7358.3 | 0 | 40 | 0 | 1 | 0 | 0 |
| 34 | Infeas | 0.13715 | 0.007498 | 0.71077 | 0.71073 | 1 | 13 | 1158.2 | 0 | 40 | 0 | 1 | 0 | 0 |
| 35 | Accept | 11.134 | 0.005467 | 0.71077 | 0.71073 | -0.5 | 1 | 75161 | 0 | 14 | 1 | 1 | 0 | 1 |
| 36 | Infeas | 0.062312 | 0.005569 | 0.71077 | 0.71076 | 1 | 13 | 1712.5 | 0 | 40 | 1 | 1 | 0 | 0 |
| 37 | Accept | 0.81122 | 0.006091 | 0.71077 | 0.71074 | -0.128 | 13 | 8182.1 | 0 | 40 | 0 | 1 | 0 | 0 |
| 38 | Best | 0.71072 | 0.00572 | 0.71072 | 0.71069 | -0.0112 | 13 | 7437.7 | 0 | 40 | 0 | 1 | 0 | 0 |
| 39 | Accept | 0.7566 | 0.014097 | 0.71072 | 0.7107 | -0.0657 | 13 | 7777.6 | 0 | 40 | 1 | 1 | 0 | 0 |
| 40 | Infeas | 0.034306 | 0.008169 | 0.71072 | 0.7107 | 1 | 13 | 1919.9 | 0 | 40 | 0 | 1 | 0 | 0 |
|==============================================================================================================================================================================================================|
| Iter | Eval | Objective | Objective | BestSoFar | BestSoFar | Constraint1 | education_num| capital_gain | capital_loss | hours_per_we-| education_nu-| capital_gain-| capital_loss-| hours_per_we-|
| | result | | runtime | (observed) | (estim.) | violation | | | | ek | m_Indicator | _Indicator | _Indicator | ek_Indicator |
|==============================================================================================================================================================================================================|
| 41 | Infeas | 0.51394 | 0.007334 | 0.71072 | 0.7107 | 1 | 13 | 1320.2 | 0 | 46 | 0 | 1 | 0 | 1 |
| 42 | Infeas | 0.081811 | 0.006322 | 0.71072 | 0.71072 | 1 | 13 | 2779.9 | 0 | 40 | 0 | 1 | 0 | 0 |
| 43 | Infeas | 0.17042 | 0.005186 | 0.71072 | 0.71072 | 1 | 13 | 2427.7 | 0 | 42 | 0 | 1 | 0 | 1 |
| 44 | Accept | 0.75616 | 0.008228 | 0.71072 | 0.71072 | -0.0652 | 13 | 7774.3 | 0 | 40 | 1 | 1 | 0 | 0 |
| 45 | Accept | 0.75692 | 0.005343 | 0.71072 | 0.71072 | -0.0661 | 13 | 7779.9 | 0 | 40 | 0 | 1 | 0 | 0 |
| 46 | Infeas | 0.0039516 | 0.009869 | 0.71072 | 0.71073 | 1 | 13 | 2144.7 | 0 | 40 | 0 | 1 | 0 | 0 |
| 47 | Infeas | 0.39251 | 0.005143 | 0.71072 | 0.71073 | 1 | 12 | 2298.1 | 0 | 40 | 1 | 1 | 0 | 0 |
| 48 | Infeas | 0.055937 | 0.009062 | 0.71072 | 0.71071 | 1 | 13 | 2588.3 | 0 | 40 | 0 | 1 | 0 | 0 |
| 49 | Infeas | 0.45148 | 0.005396 | 0.71072 | 0.71072 | 1 | 14 | 3830.8 | 0 | 40 | 1 | 1 | 0 | 0 |
| 50 | Infeas | 0.14347 | 0.012212 | 0.71072 | 0.71072 | 1 | 13 | 3236.6 | 0 | 40 | 0 | 1 | 0 | 0 |
__________________________________________________________
Optimization completed.
MaxObjectiveEvaluations of 50 reached.
Total function evaluations: 50
Total elapsed time: 17.5584 seconds
Total objective function evaluation time: 0.55032
Best observed feasible point:
education_num capital_gain capital_loss hours_per_week education_num_Indicator capital_gain_Indicator capital_loss_Indicator hours_per_week_Indicator
_____________ ____________ ____________ ______________ _______________________ ______________________ ______________________ ________________________
13 7437.7 0 40 0 1 0 0
Observed objective function value = 0.71072
Estimated objective function value = 0.71072
Function evaluation time = 0.00572
Observed constraint violations =[ -0.011247 ]
Best estimated feasible point (according to models):
education_num capital_gain capital_loss hours_per_week education_num_Indicator capital_gain_Indicator capital_loss_Indicator hours_per_week_Indicator
_____________ ____________ ____________ ______________ _______________________ ______________________ ______________________ ________________________
13 7437.7 0 40 0 1 0 0
Estimated objective function value = 0.71072
Estimated function evaluation time = 0.005968
Estimated constraint violations =[ -0.011249 ]

counterExamples=10×12 table
age workClass education_num marital_status occupation relationship race sex capital_gain capital_loss hours_per_week native_country
___ _________ _____________ ______________ ____________ _____________ _____ ____ ____________ ____________ ______________ ______________
39 State-gov 13 Never-married Adm-clerical Not-in-family White Male 7437.7 0 40 United-States
39 State-gov 13 Never-married Adm-clerical Not-in-family White Male 7438.1 0 40 United-States
39 State-gov 13 Never-married Adm-clerical Not-in-family White Male 7561 0 40 United-States
39 State-gov 13 Never-married Adm-clerical Not-in-family White Male 7774.3 0 40 United-States
39 State-gov 13 Never-married Adm-clerical Not-in-family White Male 7775.8 0 40 United-States
39 State-gov 13 Never-married Adm-clerical Not-in-family White Male 7777.6 0 40 United-States
39 State-gov 13 Never-married Adm-clerical Not-in-family White Male 7779.9 0 40 United-States
39 State-gov 13 Never-married Adm-clerical Not-in-family White Male 8182.1 0 40 United-States
39 State-gov 13 Never-married Adm-clerical Not-in-family White Male 8183.5 0 40 United-States
39 State-gov 13 Never-married Adm-clerical Not-in-family White Male 8394.9 165.4 40 United-States
The counterExamples table contains the predictor data for the counterfactual examples, listed in order of proximity to observation. If the function does not find enough feasible points during the optimization process, the number of counterfactual examples can be less than the default.
Gather into one table the modifiable predictor values for observation and the observations in counterExamples.
observationPred = observation(:,modPredictors); counterExamplesPred = counterExamples(:,modPredictors); allObservationsPred = [observationPred;counterExamplesPred]
allObservationsPred=11×4 table
education_num capital_gain capital_loss hours_per_week
_____________ ____________ ____________ ______________
13 2174 0 40
13 7437.7 0 40
13 7438.1 0 40
13 7561 0 40
13 7774.3 0 40
13 7775.8 0 40
13 7777.6 0 40
13 7779.9 0 40
13 8182.1 0 40
13 8183.5 0 40
13 8394.9 165.4 40
The first four counterfactual examples differ from observation in only one predictor (capital_gain).
Visually compare the modifiable predictor values by using a star plot (glyph plot). A star plot represents each observation as a star, in which spoke i is proportional in length to the value of predictor i for that observation. By default, glyphplot standardizes the predictors before plotting.
labels = ["observation"; ... repmat("counterEx",size(counterExamples,1),1)]; glyphplot(allObservationsPred{:,:},ObsLabels=labels, ... VarLabels=modPredictors)

The first four counterfactual examples have very similar stars (glyphs), indicating that the observations have similar predictor values. For three of the four star spokes, the stars also resemble the observation star. This result indicates that the four counterfactual examples mostly differ from observation in one predictor. Because the fourth spoke is longer for the stars of the counterfactual examples, the predictor value is greater for those observations.
Train a support vector machine (SVM) classifier to predict whether an individual makes over $50,000 per year. For an individual who is predicted to have a yearly salary of $50,000 or less, find counterfactual examples by using the counterfactuals function. That is, determine a minimal set of changes to the individual's profile that leads to a predicted yearly salary of over $50,000. Ensure that none of the counterfactual examples are anomalies by using a one-class SVM anomaly detection model.
Load the 1994 census data stored in census1994.mat. The data set consists of demographic information from the US Census Bureau, split into a training data set adultdata and a test data set adulttest. Preview the first few rows of the training data set.
load census1994
head(adultdata) age workClass fnlwgt education education_num marital_status occupation relationship race sex capital_gain capital_loss hours_per_week native_country salary
___ ________________ __________ _________ _____________ _____________________ _________________ _____________ _____ ______ ____________ ____________ ______________ ______________ ______
39 State-gov 77516 Bachelors 13 Never-married Adm-clerical Not-in-family White Male 2174 0 40 United-States <=50K
50 Self-emp-not-inc 83311 Bachelors 13 Married-civ-spouse Exec-managerial Husband White Male 0 0 13 United-States <=50K
38 Private 2.1565e+05 HS-grad 9 Divorced Handlers-cleaners Not-in-family White Male 0 0 40 United-States <=50K
53 Private 2.3472e+05 11th 7 Married-civ-spouse Handlers-cleaners Husband Black Male 0 0 40 United-States <=50K
28 Private 3.3841e+05 Bachelors 13 Married-civ-spouse Prof-specialty Wife Black Female 0 0 40 Cuba <=50K
37 Private 2.8458e+05 Masters 14 Married-civ-spouse Exec-managerial Wife White Female 0 0 40 United-States <=50K
49 Private 1.6019e+05 9th 5 Married-spouse-absent Other-service Not-in-family Black Female 0 0 16 Jamaica <=50K
52 Self-emp-not-inc 2.0964e+05 HS-grad 9 Married-civ-spouse Exec-managerial Husband White Male 0 0 45 United-States >50K
Each row contains the demographic information for one adult. The last column, salary, shows whether a person has a salary less than or equal to $50,000 per year or greater than $50,000 per year.
Delete the rows of adultdata and adulttest in which the table has missing values.
adultdata = rmmissing(adultdata); adulttest = rmmissing(adulttest);
Because education and education_num contain similar information, remove the nonnumeric education variable from adultdata and adulttest.
adultdata.education = []; adulttest.education = [];
Train an SVM classifier using adultdata. Specify the salary column of adultdata as the response and the fnlwgt column as the observation weights. Standardize the numeric predictors.
rng(0,"twister") Mdl = fitcsvm(adultdata,"salary","Weights","fnlwgt", ... Standardize=true)
Mdl =
ClassificationSVM
PredictorNames: {'age' 'workClass' 'education_num' 'marital_status' 'occupation' 'relationship' 'race' 'sex' 'capital_gain' 'capital_loss' 'hours_per_week' 'native_country'}
ResponseName: 'salary'
CategoricalPredictors: [2 4 5 6 7 8 12]
ClassNames: [<=50K >50K]
ScoreTransform: 'none'
NumObservations: 30162
Alpha: [10690×1 double]
Bias: -2.0470
KernelParameters: [1×1 struct]
Mu: [37.8785 0 0 0 0 0 0 0 10.0574 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.0937e+03 86.1783 40.7786 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
Sigma: [12.9860 1 1 1 1 1 1 1 2.5957 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 7.4580e+03 398.9456 11.7554 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1]
BoxConstraints: [30162×1 double]
ConvergenceInfo: [1×1 struct]
IsSupportVector: [30162×1 logical]
Solver: 'SMO'
Properties, Methods
Mdl is a ClassificationSVM object. You can use the model object along with its predict function to predict whether an individual makes over $50,000 per year.
Display the third observation in adultdata. Determine the predicted salary for the individual whose demographic information is in observation.
observation = adultdata(3,:)
observation=1×14 table
age workClass fnlwgt education_num marital_status occupation relationship race sex capital_gain capital_loss hours_per_week native_country salary
___ _________ __________ _____________ ______________ _________________ _____________ _____ ____ ____________ ____________ ______________ ______________ ______
38 Private 2.1565e+05 9 Divorced Handlers-cleaners Not-in-family White Male 0 0 40 United-States <=50K
label = predict(Mdl,observation)
label = categorical
<=50K
The predicted salary for the individual is less than or equal to $50,000 per year.
Before finding counterfactual examples for observation, train a one-class SVM model for anomaly detection. Assume that adulttest does not contain outliers. Specify StandardizeData as true to standardize the data, and set KernelScale to "auto" to let the function select an appropriate kernel scale parameter using a heuristic procedure. Use the same predictors as those used to train the classifier Mdl.
anomalyMdl = ocsvm(adulttest,StandardizeData=true, ... KernelScale="auto",PredictorNames=Mdl.PredictorNames)
anomalyMdl =
OneClassSVM
CategoricalPredictors: [2 4 5 6 7 8 12]
ContaminationFraction: 0
ScoreThreshold: 0.0159
PredictorNames: {'age' 'workClass' 'education_num' 'marital_status' 'occupation' 'relationship' 'race' 'sex' 'capital_gain' 'capital_loss' 'hours_per_week' 'native_country'}
KernelScale: 3.5942
Lambda: 0.1535
Properties, Methods
anomalyMdl is a OneClassSVM object. You can use the model object along with its isanomaly function to determine whether an observation is an anomaly.
Generate counterfactual examples for observation. That is, find a minimal set of changes to the predictor values of observation that result in a predicted salary of over $50,000 per year. Specify education_num, capital_gain, capital_loss, and hours_per_week as the only predictors whose values can be modified. Use the anomaly detection model anomalyMdl to ensure that none of the counterfactual examples are anomalies. By default, the counterfactuals function tries to find 10 counterfactual examples using a Bayesian optimization routine with 50 iterations. The optimization process can take some time.
modPredictors = ["education_num","capital_gain", ... "capital_loss","hours_per_week"]; counterExamples = counterfactuals(Mdl,observation, ... ModifiablePredictors=modPredictors,AnomalyModel=anomalyMdl)
|==============================================================================================================================================================================================================|
| Iter | Eval | Objective | Objective | BestSoFar | BestSoFar | Constraint1 | education_num| capital_gain | capital_loss | hours_per_we-| education_nu-| capital_gain-| capital_loss-| hours_per_we-|
| | result | | runtime | (observed) | (estim.) | violation | | | | ek | m_Indicator | _Indicator | _Indicator | ek_Indicator |
|==============================================================================================================================================================================================================|
| 1 | Best | 12.773 | 0.21059 | 12.773 | 12.773 | -20.6 | 15 | 92979 | 0 | 40 | 1 | 1 | 0 | 0 |
| 2 | Infeas | 4.617 | 0.070298 | 12.773 | 12.773 | 1 | 2 | 0 | 493.39 | 82 | 1 | 0 | 1 | 1 |
| 3 | Infeas | 3.6867 | 0.054993 | 12.773 | 12.773 | 1 | 15 | 0 | 0 | 74 | 1 | 0 | 0 | 1 |
| 4 | Best | 5.6532 | 0.06016 | 5.6532 | 5.6532 | -4.29 | 9 | 26173 | 1783.9 | 40 | 0 | 1 | 1 | 0 |
| 5 | Accept | 9.9227 | 0.10486 | 5.6532 | 5.6532 | -6.81 | 12 | 27158 | 3177.7 | 96 | 1 | 1 | 1 | 1 |
| 6 | Accept | 7.1618 | 0.023231 | 5.6532 | 5.6532 | -4.61 | 9 | 26178 | 2518.3 | 40 | 0 | 1 | 1 | 0 |
| 7 | Infeas | 1.6695 | 0.024691 | 5.6532 | 5.6532 | 1 | 9 | 0 | 0 | 60 | 0 | 0 | 0 | 1 |
| 8 | Infeas | 0.75096 | 0.038485 | 5.6532 | 5.6532 | 1 | 9 | 5561.8 | 0 | 40 | 0 | 1 | 0 | 0 |
| 9 | Infeas | 3.1082 | 0.025382 | 5.6532 | 5.6532 | 1 | 9 | 2582.6 | 0 | 3 | 0 | 1 | 0 | 1 |
| 10 | Best | 2.191 | 0.022102 | 2.191 | 2.191 | -1.16 | 9 | 16215 | 0 | 41 | 0 | 1 | 0 | 1 |
| 11 | Infeas | 1.5086 | 0.022778 | 2.191 | 2.191 | 1 | 7 | 9218.1 | 0 | 36 | 1 | 1 | 0 | 1 |
| 12 | Infeas | 1.5687 | 0.022827 | 2.191 | 2.191 | 1 | 5 | 129.76 | 0 | 40 | 1 | 1 | 0 | 0 |
| 13 | Accept | 2.6978 | 0.024336 | 2.191 | 2.191 | -1.57 | 14 | 13723 | 0 | 40 | 1 | 1 | 0 | 0 |
| 14 | Best | 1.703 | 0.01959 | 1.703 | 1.703 | -0.292 | 9 | 12613 | 0 | 40 | 0 | 1 | 0 | 0 |
| 15 | Accept | 2.0774 | 0.023835 | 1.703 | 1.703 | -0.4 | 9 | 11785 | 0 | 56 | 0 | 1 | 0 | 1 |
| 16 | Infeas | 1.6989 | 0.021259 | 1.703 | 1.703 | 1 | 9 | 9430.6 | 453.44 | 41 | 0 | 1 | 1 | 1 |
| 17 | Accept | 3.8311 | 0.051043 | 1.703 | 1.703 | -0.756 | 4 | 16515 | 0 | 69 | 1 | 1 | 0 | 1 |
| 18 | Accept | 1.7302 | 0.022291 | 1.703 | 1.703 | -0.137 | 12 | 9395.8 | 0 | 40 | 1 | 1 | 0 | 0 |
| 19 | Accept | 5.8608 | 0.022863 | 1.703 | 1.703 | -5.12 | 9 | 36094 | 0 | 1 | 0 | 1 | 0 | 1 |
| 20 | Best | 1.5397 | 0.022932 | 1.5397 | 1.5397 | -0.005 | 9 | 11403 | 0 | 40 | 0 | 1 | 0 | 0 |
|==============================================================================================================================================================================================================|
| Iter | Eval | Objective | Objective | BestSoFar | BestSoFar | Constraint1 | education_num| capital_gain | capital_loss | hours_per_we-| education_nu-| capital_gain-| capital_loss-| hours_per_we-|
| | result | | runtime | (observed) | (estim.) | violation | | | | ek | m_Indicator | _Indicator | _Indicator | ek_Indicator |
|==============================================================================================================================================================================================================|
| 21 | Accept | 2.8858 | 0.024776 | 1.5397 | 1.5397 | -0.282 | 16 | 6590.3 | 0 | 40 | 1 | 1 | 0 | 0 |
| 22 | Accept | 13.693 | 0.020913 | 1.5397 | 1.5397 | -18.7 | 1 | 92113 | 233.84 | 97 | 1 | 1 | 1 | 1 |
| 23 | Best | 1.5053 | 0.021531 | 1.5053 | 1.5053 | -0.0558 | 10 | 10763 | 0 | 40 | 1 | 1 | 0 | 0 |
| 24 | Infeas | 1.5687 | 0.041456 | 1.5053 | 1.5053 | 1 | 13 | 12.669 | 0 | 40 | 1 | 1 | 0 | 0 |
| 25 | Accept | 1.5301 | 0.023617 | 1.5053 | 1.5053 | -0.0133 | 11 | 9730 | 0 | 40 | 1 | 1 | 0 | 0 |
| 26 | Accept | 1.948 | 0.021114 | 1.5053 | 1.5053 | -0.0067 | 9 | 12374 | 0 | 28 | 0 | 1 | 0 | 1 |
| 27 | Infeas | 11.469 | 0.029688 | 1.5053 | 1.5053 | 1.01 | 16 | 0 | 4315.5 | 2 | 1 | 0 | 1 | 1 |
| 28 | Accept | 17.214 | 0.025916 | 1.5053 | 1.5053 | -19.8 | 1 | 96349 | 4259.1 | 10 | 1 | 1 | 1 | 1 |
| 29 | Accept | 8.657 | 0.030098 | 1.5053 | 1.5053 | -11.8 | 16 | 50823 | 0 | 94 | 1 | 1 | 0 | 1 |
| 30 | Accept | 6.4715 | 0.02971 | 1.5053 | 1.5053 | -5.62 | 1 | 41921 | 0 | 40 | 1 | 1 | 0 | 0 |
| 31 | Infeas | 11.336 | 0.022861 | 1.5053 | 1.5053 | 1 | 1 | 0 | 4233.1 | 76 | 1 | 0 | 1 | 1 |
| 32 | Accept | 1.5713 | 0.036967 | 1.5053 | 1.5053 | -0.0375 | 9 | 11621 | 0 | 39 | 0 | 1 | 0 | 1 |
| 33 | Accept | 1.737 | 0.052326 | 1.5053 | 1.5053 | -0.108 | 11 | 9246.8 | 0 | 51 | 1 | 1 | 0 | 1 |
| 34 | Best | 1.4789 | 0.031443 | 1.4789 | 1.4788 | -0.00778 | 10 | 10561 | 0 | 40 | 1 | 1 | 0 | 0 |
| 35 | Accept | 4.5489 | 0.021642 | 1.4789 | 1.4788 | -0.785 | 16 | 11846 | 0 | 1 | 1 | 1 | 0 | 1 |
| 36 | Accept | 1.5376 | 0.019771 | 1.4789 | 1.4788 | -0.00126 | 9 | 11388 | 0 | 40 | 0 | 1 | 0 | 0 |
| 37 | Accept | 8.4915 | 0.04055 | 1.4789 | 1.4788 | -10.9 | 16 | 54410 | 0 | 1 | 1 | 1 | 0 | 1 |
| 38 | Accept | 7.537 | 0.020479 | 1.4789 | 1.4788 | -6.25 | 2 | 39324 | 0 | 95 | 1 | 1 | 0 | 1 |
| 39 | Infeas | 4.9217 | 0.021068 | 1.4789 | 1.4788 | 1 | 9 | 6554 | 0 | 98 | 0 | 1 | 0 | 1 |
| 40 | Infeas | 1.0018 | 0.026839 | 1.4789 | 1.4788 | 1 | 9 | 0 | 6.8891 | 28 | 0 | 0 | 1 | 1 |
|==============================================================================================================================================================================================================|
| Iter | Eval | Objective | Objective | BestSoFar | BestSoFar | Constraint1 | education_num| capital_gain | capital_loss | hours_per_we-| education_nu-| capital_gain-| capital_loss-| hours_per_we-|
| | result | | runtime | (observed) | (estim.) | violation | | | | ek | m_Indicator | _Indicator | _Indicator | ek_Indicator |
|==============================================================================================================================================================================================================|
| 41 | Accept | 14.39 | 0.040977 | 1.4789 | 1.4788 | -16.3 | 15 | 63071 | 4309.6 | 87 | 1 | 1 | 1 | 1 |
| 42 | Accept | 4.2317 | 0.034849 | 1.4789 | 1.4788 | -1.26 | 15 | 9084.1 | 1332.7 | 40 | 1 | 1 | 1 | 0 |
| 43 | Infeas | 1.1832 | 0.031062 | 1.4789 | 1.4788 | 1 | 12 | 0 | 50.725 | 40 | 1 | 0 | 1 | 0 |
| 44 | Accept | 1.4752 | 0.024253 | 1.4789 | 1.4789 | -0.001 | 10 | 10532 | 0 | 40 | 1 | 1 | 0 | 0 |
| 45 | Infeas | 5.839 | 0.02437 | 1.4789 | 1.4789 | 1 | 1 | 0 | 1522.9 | 2 | 1 | 0 | 1 | 1 |
| 46 | Accept | 1.4757 | 0.01924 | 1.4752 | 1.4751 | -0.00204 | 10 | 10537 | 0 | 40 | 1 | 1 | 0 | 0 |
| 47 | Infeas | 2.671 | 0.035312 | 1.4752 | 1.4751 | 1 | 9 | 0 | 1075.1 | 43 | 0 | 0 | 1 | 1 |
| 48 | Accept | 1.5341 | 0.019335 | 1.4752 | 1.4751 | -0.00671 | 9 | 11089 | 0 | 44 | 0 | 1 | 0 | 1 |
| 49 | Accept | 5.6243 | 0.027164 | 1.4752 | 1.4751 | -5.12 | 16 | 23405 | 0 | 85 | 1 | 1 | 0 | 1 |
| 50 | Accept | 1.5383 | 0.029368 | 1.4752 | 1.4751 | -0.116 | 10 | 11017 | 0 | 40 | 1 | 1 | 0 | 0 |
__________________________________________________________
Optimization completed.
MaxObjectiveEvaluations of 50 reached.
Total function evaluations: 50
Total elapsed time: 43.7385 seconds
Total objective function evaluation time: 1.7372
Best observed feasible point:
education_num capital_gain capital_loss hours_per_week education_num_Indicator capital_gain_Indicator capital_loss_Indicator hours_per_week_Indicator
_____________ ____________ ____________ ______________ _______________________ ______________________ ______________________ ________________________
10 10532 0 40 1 1 0 0
Observed objective function value = 1.4752
Estimated objective function value = 1.4751
Function evaluation time = 0.024253
Observed constraint violations =[ -0.001004 ]
Best estimated feasible point (according to models):
education_num capital_gain capital_loss hours_per_week education_num_Indicator capital_gain_Indicator capital_loss_Indicator hours_per_week_Indicator
_____________ ____________ ____________ ______________ _______________________ ______________________ ______________________ ________________________
10 10532 0 40 1 1 0 0
Estimated objective function value = 1.4751
Estimated function evaluation time = 0.025262
Estimated constraint violations =[ -0.000996 ]

counterExamples=10×12 table
age workClass education_num marital_status occupation relationship race sex capital_gain capital_loss hours_per_week native_country
___ _________ _____________ ______________ _________________ _____________ _____ ____ ____________ ____________ ______________ ______________
38 Private 10 Divorced Handlers-cleaners Not-in-family White Male 10532 0 40 United-States
38 Private 10 Divorced Handlers-cleaners Not-in-family White Male 10537 0 40 United-States
38 Private 10 Divorced Handlers-cleaners Not-in-family White Male 10561 0 40 United-States
38 Private 10 Divorced Handlers-cleaners Not-in-family White Male 10763 0 40 United-States
38 Private 11 Divorced Handlers-cleaners Not-in-family White Male 9730 0 40 United-States
38 Private 9 Divorced Handlers-cleaners Not-in-family White Male 11089 0 44 United-States
38 Private 9 Divorced Handlers-cleaners Not-in-family White Male 11388 0 40 United-States
38 Private 10 Divorced Handlers-cleaners Not-in-family White Male 11017 0 40 United-States
38 Private 9 Divorced Handlers-cleaners Not-in-family White Male 11403 0 40 United-States
38 Private 9 Divorced Handlers-cleaners Not-in-family White Male 11621 0 39 United-States
The counterExamples table contains the predictor data for the counterfactual examples, listed in order of proximity to observation. If the function does not find enough feasible points during the optimization process, the number of counterfactual examples can be less than the default.
Train a tree classifier to predict whether a corporate customer has a "good" or "poor" credit rating. For a customer who is predicted to have a poor rating, find counterfactual examples by using the counterfactuals function. That is, determine a minimal set of changes to the customer profile that leads to a predicted credit rating of "good." Display diagnostics for the counterfactual examples.
Read the sample file CreditRating_Historical.dat into a table. The predictor data consists of financial ratios and industry sector information for a list of corporate customers. The response variable consists of credit ratings assigned by a rating agency. Preview the first few rows of the data set.
creditrating = readtable("CreditRating_Historical.dat");
head(creditrating) ID WC_TA RE_TA EBIT_TA MVE_BVTD S_TA Industry Rating
_____ ______ ______ _______ ________ _____ ________ _______
62394 0.013 0.104 0.036 0.447 0.142 3 {'BB' }
48608 0.232 0.335 0.062 1.969 0.281 8 {'A' }
42444 0.311 0.367 0.074 1.935 0.366 1 {'A' }
48631 0.194 0.263 0.062 1.017 0.228 4 {'BBB'}
43768 0.121 0.413 0.057 3.647 0.466 12 {'AAA'}
39255 -0.117 -0.799 0.01 0.179 0.082 4 {'CCC'}
62236 0.087 0.158 0.049 0.816 0.324 2 {'BBB'}
39354 0.005 0.181 0.034 2.597 0.388 7 {'AA' }
In the Rating response variable, combine the AAA, AA, A, and BBB ratings into a category of "good" ratings, and the BB, B, and CCC ratings into a category of "poor" ratings. Also, convert the Industry variable to a categorical variable.
Rating = categorical(creditrating.Rating); Rating = mergecats(Rating,["AAA","AA","A","BBB"],"good"); Rating = mergecats(Rating,["BB","B","CCC"],"poor"); creditrating.Rating = Rating; creditrating.Industry = categorical(creditrating.Industry);
Train a tree classifier using the creditrating data. Specify the Rating column of creditrating as the response, and the other columns, excluding ID, as predictors.
predictors = ["WC_TA","RE_TA","EBIT_TA","MVE_BVTD","S_TA", ... "Industry"]; rng(0,"twister") % For reproducibility Mdl = fitctree(creditrating,"Rating",PredictorNames=predictors)
Mdl =
ClassificationTree
PredictorNames: {'WC_TA' 'RE_TA' 'EBIT_TA' 'MVE_BVTD' 'S_TA' 'Industry'}
ResponseName: 'Rating'
CategoricalPredictors: 6
ClassNames: [good poor]
ScoreTransform: 'none'
NumObservations: 3932
Properties, Methods
Mdl is a ClassificationTree object. You can use the model object along with its predict function to predict whether a customer has a good credit rating.
Display the sixth observation in creditrating. Determine the predicted credit rating for the corporate customer whose information is in observation.
observation = creditrating(6,:)
observation=1×8 table
ID WC_TA RE_TA EBIT_TA MVE_BVTD S_TA Industry Rating
_____ ______ ______ _______ ________ _____ ________ ______
39255 -0.117 -0.799 0.01 0.179 0.082 4 poor
label = predict(Mdl,observation)
label = categorical
poor
The corporate customer has a predicted credit rating of poor.
Generate five counterfactual examples for the observation (corporate customer with ID 39255). Additionally return diagnostics for the counterfactual examples. By default, the counterfactuals function uses a Bayesian optimization routine with 50 iterations. The optimization process can take some time.
rng(0,"twister") [counterExamples,metrics] = counterfactuals(Mdl,observation, ... NumCounterfactualExamples=5);
|==========================================================================================================================================================================================================================================================================|
| Iter | Eval | Objective | Objective | BestSoFar | BestSoFar | Constraint1 | WC_TA | RE_TA | EBIT_TA | MVE_BVTD | S_TA | Industry | WC_TA_Indica-| RE_TA_Indica-| EBIT_TA_Indi-| MVE_BVTD_Ind-| S_TA_Indicat-| Industry_Ind-|
| | result | | runtime | (observed) | (estim.) | violation | | | | | | | tor | tor | cator | icator | or | icator |
|==========================================================================================================================================================================================================================================================================|
| 1 | Infeas | 0.26817 | 0.50708 | NaN | 0.26817 | 1 | -0.117 | -0.17048 | 0.16531 | 0.179 | 0.29959 | 4 | 0 | 1 | 1 | 0 | 1 | 0 |
| 2 | Infeas | 0.48947 | 0.26762 | NaN | 0.26817 | 1 | -0.117 | -0.799 | 0.20204 | 0.179 | 6.4577 | 3 | 0 | 0 | 1 | 0 | 1 | 1 |
| 3 | Infeas | 0.3406 | 0.066527 | NaN | 0.26817 | 1 | -1.5205 | 1.0139 | -0.52017 | 0.179 | 0.082 | 12 | 1 | 1 | 1 | 0 | 0 | 1 |
| 4 | Best | 0.4911 | 0.028147 | 0.4911 | 0.4911 | -0.5 | -0.117 | 1.6522 | 0.01 | 73.705 | 0.082 | 3 | 0 | 1 | 0 | 1 | 0 | 1 |
| 5 | Best | 0.32483 | 0.05205 | 0.32483 | 0.32483 | -0.5 | -0.117 | 1.0724 | 0.01 | 72.905 | 0.082 | 4 | 0 | 1 | 0 | 1 | 0 | 0 |
| 6 | Accept | 0.32517 | 0.013555 | 0.32483 | 0.32483 | -0.5 | -0.117 | 1.3803 | 0.01 | 69.491 | 0.082 | 4 | 0 | 1 | 0 | 1 | 0 | 1 |
| 7 | Infeas | 0.1716 | 0.013136 | 0.32483 | 0.32483 | 1 | -0.117 | -3.0319 | -0.59091 | 0.179 | 1.2919 | 4 | 0 | 1 | 1 | 0 | 1 | 0 |
| 8 | Accept | 0.33178 | 0.014379 | 0.32483 | 0.32483 | -0.367 | -0.117 | -2.9936 | -0.58945 | 82.953 | 2.2296 | 4 | 0 | 1 | 1 | 1 | 1 | 0 |
| 9 | Best | 0.32309 | 0.01392 | 0.32309 | 0.32309 | -0.367 | -0.117 | -2.6046 | 0.01 | 71.686 | 1.0103 | 4 | 0 | 1 | 0 | 1 | 1 | 0 |
| 10 | Infeas | 0.010967 | 0.0124 | 0.32309 | 0.32309 | 1 | -0.117 | -2.9336 | 0.01 | 0.29214 | 0.082 | 4 | 0 | 1 | 0 | 1 | 0 | 0 |
| 11 | Best | 0.16316 | 0.012652 | 0.16316 | 0.16316 | -0.367 | -0.117 | -3.246 | 0.01 | 43.108 | 0.082 | 4 | 0 | 1 | 0 | 1 | 0 | 0 |
| 12 | Infeas | 0.0080855 | 0.013371 | 0.16316 | 0.16316 | 1 | -0.117 | -0.799 | 0.01 | 0.038794 | 0.082 | 4 | 0 | 0 | 0 | 1 | 0 | 1 |
| 13 | Accept | 0.2112 | 0.011575 | 0.16316 | 0.16316 | -0.5 | -0.13893 | -0.799 | -0.43091 | 0.66632 | 2.8912 | 4 | 1 | 0 | 1 | 1 | 1 | 0 |
| 14 | Accept | 0.23857 | 0.011147 | 0.16316 | 0.16316 | -0.5 | -0.117 | 0.1938 | -0.52596 | 34.596 | 0.082 | 4 | 0 | 1 | 1 | 1 | 0 | 0 |
| 15 | Infeas | 0.2415 | 0.011438 | 0.16316 | 0.16316 | 1 | 0.13161 | -0.799 | 0.01 | 0.179 | 0.76163 | 4 | 1 | 0 | 0 | 0 | 1 | 1 |
| 16 | Accept | 0.60903 | 0.011517 | 0.16316 | 0.16316 | -0.367 | 0.42198 | -1.7231 | 0.20164 | 2.9758 | 2.1236 | 4 | 1 | 1 | 1 | 1 | 1 | 0 |
| 17 | Best | 0.16287 | 0.011607 | 0.16287 | 0.16287 | -0.367 | -0.117 | -2.6267 | 0.01 | 23.678 | 0.082 | 4 | 0 | 1 | 0 | 1 | 0 | 0 |
| 18 | Best | 0.10977 | 0.011277 | 0.10977 | 0.10977 | -0.367 | -0.15617 | -0.799 | 0.01 | 1.8443 | 0.082 | 4 | 1 | 0 | 0 | 1 | 0 | 0 |
| 19 | Best | 0.077053 | 0.01167 | 0.077053 | 0.077053 | -0.167 | -1.6761 | -0.799 | -0.25553 | 0.9777 | 0.082 | 4 | 1 | 0 | 1 | 1 | 0 | 1 |
| 20 | Infeas | 0.1371 | 0.013347 | 0.077053 | 0.077053 | 1 | -1.0156 | 0.013992 | -0.57955 | 1.684 | 0.082 | 4 | 1 | 1 | 1 | 1 | 0 | 0 |
|==========================================================================================================================================================================================================================================================================|
| Iter | Eval | Objective | Objective | BestSoFar | BestSoFar | Constraint1 | WC_TA | RE_TA | EBIT_TA | MVE_BVTD | S_TA | Industry | WC_TA_Indica-| RE_TA_Indica-| EBIT_TA_Indi-| MVE_BVTD_Ind-| S_TA_Indicat-| Industry_Ind-|
| | result | | runtime | (observed) | (estim.) | violation | | | | | | | tor | tor | cator | icator | or | icator |
|==========================================================================================================================================================================================================================================================================|
| 21 | Infeas | 0.16022 | 0.011291 | 0.077053 | 0.077053 | 1 | 0.26254 | -0.80631 | -0.26419 | 0.51331 | 0.082 | 4 | 1 | 1 | 1 | 1 | 0 | 0 |
| 22 | Infeas | 0.16041 | 0.015516 | 0.077053 | 0.077053 | 1 | -2.2258 | -0.799 | 0.01 | 8.5207 | 0.082 | 4 | 1 | 0 | 0 | 1 | 0 | 0 |
| 23 | Accept | 0.077562 | 0.014002 | 0.077053 | 0.077054 | -0.167 | -1.68 | -0.799 | -0.16012 | 0.98831 | 0.082 | 4 | 1 | 0 | 1 | 1 | 0 | 0 |
| 24 | Infeas | 0.15958 | 0.013708 | 0.077053 | 0.077053 | 1 | -1.631 | -0.799 | 0.023051 | 4.5989 | 0.082 | 4 | 1 | 0 | 1 | 1 | 0 | 0 |
| 25 | Accept | 0.11964 | 0.011669 | 0.077053 | 0.077053 | -0.367 | -0.10478 | -0.799 | 0.01 | 2.2284 | 0.082 | 4 | 1 | 0 | 0 | 1 | 0 | 0 |
| 26 | Infeas | 0.013425 | 0.011841 | 0.077053 | 0.077053 | 1 | -1.2076 | -2.6688 | 0.01 | 0.050326 | 0.082 | 4 | 1 | 1 | 0 | 1 | 0 | 0 |
| 27 | Infeas | 0.053852 | 0.010968 | 0.077053 | 0.077053 | 1 | -1.2803 | -0.799 | -0.29726 | 0.69133 | 0.082 | 4 | 1 | 0 | 1 | 1 | 0 | 0 |
| 28 | Accept | 0.22825 | 0.011956 | 0.077053 | 0.077053 | -0.367 | 0.094888 | -3.1318 | 0.01 | 7.7412 | 0.082 | 4 | 1 | 1 | 0 | 1 | 0 | 0 |
| 29 | Infeas | 0.20276 | 0.015039 | 0.077053 | 0.077053 | 1 | -0.117 | -0.799 | 0.12028 | 0.69785 | 0.082 | 4 | 0 | 0 | 1 | 1 | 0 | 0 |
| 30 | Infeas | 0.092288 | 0.012645 | 0.077053 | 0.077053 | 1 | -0.5544 | -2.1103 | 0.01 | 1.3011 | 0.082 | 4 | 1 | 1 | 0 | 1 | 0 | 0 |
| 31 | Infeas | 0.048894 | 0.010991 | 0.077053 | 0.077053 | 1 | -0.52267 | -0.799 | 0.01 | 0.71116 | 0.082 | 4 | 1 | 0 | 0 | 1 | 0 | 0 |
| 32 | Infeas | 0.066904 | 0.011913 | 0.077053 | 0.077053 | 1 | -1.5836 | -0.799 | -0.22985 | 0.83931 | 0.082 | 4 | 1 | 0 | 1 | 1 | 0 | 0 |
| 33 | Infeas | 0.0052675 | 0.012152 | 0.077053 | 0.077053 | 1 | -1.3814 | -0.799 | 0.01 | 0.179 | 0.082 | 4 | 1 | 0 | 0 | 0 | 0 | 0 |
| 34 | Infeas | 0.0037843 | 0.010922 | 0.077053 | 0.077053 | 1 | -0.117 | -3.2625 | 0.01 | 0.179 | 0.082 | 4 | 0 | 1 | 0 | 0 | 0 | 0 |
| 35 | Infeas | 0.085719 | 0.011322 | 0.077053 | 0.077053 | 1 | 0.13048 | -0.799 | 0.01 | 0.179 | 0.082 | 4 | 1 | 0 | 0 | 0 | 0 | 0 |
| 36 | Infeas | 0.0050132 | 0.012732 | 0.077053 | 0.077053 | 1 | -0.72619 | -0.799 | 0.01 | 0.179 | 0.082 | 4 | 1 | 0 | 0 | 0 | 0 | 0 |
| 37 | Infeas | 0.0033817 | 0.010955 | 0.077053 | 0.077053 | 1 | -0.1869 | -0.799 | 0.01 | 0.179 | 0.082 | 4 | 1 | 0 | 0 | 0 | 0 | 0 |
| 38 | Accept | 0.12117 | 0.010979 | 0.077053 | 0.077053 | -0.367 | -0.24026 | -0.799 | 0.01 | 2.2306 | 0.082 | 4 | 1 | 0 | 0 | 1 | 0 | 0 |
| 39 | Infeas | 0.081291 | 0.011551 | 0.077053 | 0.077053 | 1 | -0.14034 | -0.799 | 0.01 | 1.172 | 0.082 | 4 | 1 | 0 | 0 | 1 | 0 | 0 |
| 40 | Infeas | 0.0061786 | 0.011493 | 0.077053 | 0.077053 | 1 | -0.31018 | -3.0656 | 0.01 | 0.179 | 0.082 | 4 | 1 | 1 | 0 | 0 | 0 | 0 |
|==========================================================================================================================================================================================================================================================================|
| Iter | Eval | Objective | Objective | BestSoFar | BestSoFar | Constraint1 | WC_TA | RE_TA | EBIT_TA | MVE_BVTD | S_TA | Industry | WC_TA_Indica-| RE_TA_Indica-| EBIT_TA_Indi-| MVE_BVTD_Ind-| S_TA_Indicat-| Industry_Ind-|
| | result | | runtime | (observed) | (estim.) | violation | | | | | | | tor | tor | cator | icator | or | icator |
|==========================================================================================================================================================================================================================================================================|
| 41 | Infeas | 0.16946 | 0.013665 | 0.077053 | 0.077053 | 1 | -1.115 | 1.4854 | 0.01 | 0.179 | 0.082 | 4 | 1 | 1 | 0 | 0 | 0 | 0 |
| 42 | Infeas | 0.14333 | 0.012446 | 0.077053 | 0.077053 | 1 | -1.682 | -1.0086 | -0.19245 | 0.179 | 0.42391 | 4 | 1 | 1 | 1 | 0 | 1 | 0 |
| 43 | Infeas | 0.0038267 | 0.013989 | 0.077053 | 0.077053 | 1 | -0.24297 | -0.799 | 0.01 | 0.179 | 0.082 | 4 | 1 | 0 | 0 | 0 | 0 | 0 |
| 44 | Infeas | 0.0052675 | 0.012501 | 0.077053 | 0.077053 | 1 | -1.0416 | -0.799 | 0.01 | 0.179 | 0.082 | 4 | 1 | 0 | 0 | 0 | 0 | 0 |
| 45 | Infeas | 0.0029792 | 0.011013 | 0.077053 | 0.077053 | 1 | -0.11117 | -1.6065 | 0.01 | 0.179 | 0.082 | 4 | 1 | 1 | 0 | 0 | 0 | 0 |
| 46 | Accept | 0.3093 | 0.011682 | 0.077053 | 0.077053 | -0.5 | -0.117 | 0.55139 | 0.01 | 26.936 | 0.082 | 4 | 0 | 1 | 0 | 1 | 0 | 0 |
| 47 | Accept | 0.16202 | 0.011139 | 0.077053 | 0.077053 | -0.367 | -0.117 | -1.9857 | 0.01 | 19.911 | 0.082 | 4 | 0 | 1 | 0 | 1 | 0 | 0 |
| 48 | Infeas | 0.0053734 | 0.012275 | 0.077053 | 0.077053 | 1 | -0.18197 | -2.4817 | 0.01 | 0.179 | 0.082 | 4 | 1 | 1 | 0 | 0 | 0 | 0 |
| 49 | Infeas | 0.0064329 | 0.012885 | 0.077053 | 0.077053 | 1 | -1.2316 | -1.6373 | 0.01 | 0.179 | 0.082 | 4 | 1 | 1 | 0 | 0 | 0 | 0 |
| 50 | Infeas | 0.0036996 | 0.012831 | 0.077053 | 0.077053 | 1 | -0.16775 | -0.97682 | 0.01 | 0.179 | 0.082 | 4 | 1 | 1 | 0 | 0 | 0 | 0 |
__________________________________________________________
Optimization completed.
MaxObjectiveEvaluations of 50 reached.
Total function evaluations: 50
Total elapsed time: 35.5041 seconds
Total objective function evaluation time: 1.4765
Best observed feasible point:
WC_TA RE_TA EBIT_TA MVE_BVTD S_TA Industry WC_TA_Indicator RE_TA_Indicator EBIT_TA_Indicator MVE_BVTD_Indicator S_TA_Indicator Industry_Indicator
_______ ______ ________ ________ _____ ________ _______________ _______________ _________________ __________________ ______________ __________________
-1.6761 -0.799 -0.25553 0.9777 0.082 4 1 0 1 1 0 1
Observed objective function value = 0.077053
Estimated objective function value = 0.077053
Function evaluation time = 0.01167
Observed constraint violations =[ -0.166667 ]
Best estimated feasible point (according to models):
WC_TA RE_TA EBIT_TA MVE_BVTD S_TA Industry WC_TA_Indicator RE_TA_Indicator EBIT_TA_Indicator MVE_BVTD_Indicator S_TA_Indicator Industry_Indicator
_______ ______ ________ ________ _____ ________ _______________ _______________ _________________ __________________ ______________ __________________
-1.6761 -0.799 -0.25553 0.9777 0.082 4 1 0 1 1 0 1
Estimated objective function value = 0.077053
Estimated function evaluation time = 0.011687
Estimated constraint violations =[ -0.166667 ]

The counterExamples table contains the predictor data for the counterfactual examples, listed in order of proximity to observation. The metrics table contains additional information about the counterfactual examples.
Combine the metrics and counterExamples tables.
Tbl = [metrics,counterExamples]
Tbl=5×10 table
Distance NumModifiedPredictors LearnerScore AnomalyScore WC_TA RE_TA EBIT_TA MVE_BVTD S_TA Industry
________ _____________________ ____________ ____________ ________ ______ ________ ________ _____ ________
0.077053 3 0.66667 NaN -1.6761 -0.799 -0.25553 0.9777 0.082 4
0.077562 3 0.66667 NaN -1.68 -0.799 -0.16012 0.98831 0.082 4
0.10977 2 0.86667 NaN -0.15617 -0.799 0.01 1.8443 0.082 4
0.11964 2 0.86667 NaN -0.10478 -0.799 0.01 2.2284 0.082 4
0.12117 2 0.86667 NaN -0.24026 -0.799 0.01 2.2306 0.082 4
The first counterfactual example is the smallest distance away from the original observation. However, this counterfactual example has more modified predictors (3) and a lower learner score (0.6667) than the third, fourth, and fifth counterfactual examples. The lower learner score indicates that the classifier is less confident in the prediction of the first counterfactual example than in the prediction of the third, fourth, and fifth counterfactual examples.
Verify the model predictions for the counterfactual examples by using the predict function of Mdl.
[labels,scores] = predict(Mdl,counterExamples); predictions = array2table(scores,VariableNames=string(Mdl.ClassNames))
predictions=5×2 table
good poor
_______ _______
0.66667 0.33333
0.66667 0.33333
0.86667 0.13333
0.86667 0.13333
0.86667 0.13333
Input Arguments
Binary classification model, specified as a full or compact classification model object, as given in the following table of supported models.
| Model | Full or Compact Model Object |
|---|---|
| Discriminant analysis classifier | ClassificationDiscriminant, CompactClassificationDiscriminant |
| Ensemble of learners for classification | ClassificationEnsemble, CompactClassificationEnsemble, ClassificationBaggedEnsemble |
| Generalized additive model (GAM) | ClassificationGAM, CompactClassificationGAM |
| Gaussian kernel classification model using random feature expansion | ClassificationKernel |
| k-nearest neighbor classifier | ClassificationKNN |
| Linear classification model | ClassificationLinear |
| Naive Bayes model | ClassificationNaiveBayes, CompactClassificationNaiveBayes |
| Neural network classifier | ClassificationNeuralNetwork, CompactClassificationNeuralNetwork |
| Support vector machine (SVM) for binary classification | ClassificationSVM, CompactClassificationSVM |
| Decision tree for binary classification | ClassificationTree, CompactClassificationTree |
If Mdl is a model object that does not contain predictor data (for
example, a compact model), you must provide the input argument
X.
Observation for which to compute counterfactual examples, specified as a numeric row vector or a table with one row.
For a numeric row vector:
For a table with one row:
If you trained
Mdlusing a table (for example,Tbl), then all predictor variables inobservationmust have the same variable names and data types as those inTbl. However, the column order ofobservationdoes not need to correspond to the column order ofTbl.If you trained
Mdlusing a numeric matrix, then the predictor names inMdl.PredictorNamesand the corresponding predictor variable names inobservationmust be the same. To specify predictor names during training, use thePredictorNamesname-value argument. All predictor variables inobservationmust be numeric vectors.observationcan contain additional variables (response variable, observation weights, and so on), butcounterfactualsignores them.counterfactualsdoes not support multicolumn variables or cell arrays other than cell arrays of character vectors.
Data Types: single | double | table
Data used to train the model Mdl, specified as a numeric matrix or a
table. Each row of X
corresponds to one observation, and each column
corresponds to one variable.
For a numeric matrix:
The variables that make up the columns of
Xmust have the same order as the predictor variables that trainedMdl(that is,Mdl.PredictorNames).If you trained
Mdlusing a table, thenXcan be a numeric matrix if the table contains all numeric predictor variables.
For a table:
If you trained
Mdlusing a table (for example,Tbl), then all predictor variables inXmust have the same variable names and data types as those inTbl. However, the column order ofXdoes not need to correspond to the column order ofTbl.If you trained
Mdlusing a numeric matrix, then the predictor names inMdl.PredictorNamesand the corresponding predictor variable names inXmust be the same. To specify predictor names during training, use thePredictorNamesname-value argument. All predictor variables inXmust be numeric vectors.Xcan contain additional variables (response variables, observation weights, and so on), butcounterfactualsignores them.counterfactualsdoes not support multicolumn variables or cell arrays other than cell arrays of character vectors.
Specify X only when Mdl does not contain the
predictor data used during training.
Data Types: single | double | table
Name-Value Arguments
Specify optional pairs of arguments as
Name1=Value1,...,NameN=ValueN, where Name is
the argument name and Value is the corresponding value.
Name-value arguments must appear after other arguments, but the order of the
pairs does not matter.
Example:
counterfactuals(Mdl,observation,NumCounterfactualExamples=1,MaxNumModifiablePredictors=5)
specifies to return one counterfactual example with
at most five predictor values that differ from the
original observation.
Number of counterfactual examples to return in counterExamples, specified
as a positive integer scalar.
counterExamples can include
fewer observations than specified if constraints
prevent the software from finding
NumCounterfactualExamples
feasible observations.
Example: NumCounterfactualExamples=5
Data Types: single | double
Maximum number of predictors that can be changed in each counterfactual example, specified as
a positive integer scalar. For each counterfactual example, the function assumes that
all predictor values can be changed from those in observation, by
default.
To see the total number of modified predictors for each counterfactual example, return the
metrics table and view the
NumModifiedPredictors column.
Example: MaxNumModifiablePredictors=10
Data Types: single | double
Indices or names of the predictors that can be changed in the counterfactual examples,
specified as a numeric vector, character vector, string scalar, string array, or cell
array of character vectors. By default, the function assumes that all predictor values
can be changed from those in observation.
| Value | Description |
|---|---|
| Positive integer vector | Each entry in the vector is an index value indicating that the
corresponding variable is a modifiable predictor. The index values are
between 1 and p, where p is the
number of predictors listed in
|
| Character vector or string scalar | The value is the name of a modifiable predictor in
Mdl.PredictorNames. |
| String array or cell array of character vectors | Each element in the array is the name of a modifiable predictor. The
names must match the entries in
Mdl.PredictorNames. |
Example: ModifiablePredictors=["workClass","hours-per-week"]
Data Types: single | double | char | string | cell
Anomaly detection model, specified as one of the following model objects. The software uses the anomaly detection model to identify observations that are anomalies and prevent them from being returned as counterfactual examples.
| Model | Anomaly Detection Model Object |
|---|---|
| Isolation forest model | IsolationForest |
| Local outlier factor model | LocalOutlierFactor |
| One-class support vector machine (SVM) model | OneClassSVM |
| Robust random cut forest model | RobustRandomCutForest |
Anomaly constraint threshold, specified as a numeric scalar. The software identifies observations with anomaly scores above the threshold as anomalies and prevents them from being returned as counterfactual examples. The following table describes the range and default values for the threshold.
| Model |
MaxAnomalyScore
Range
|
Default
MaxAnomalyScore
|
|---|---|---|
| Isolation forest model | [0,1] | ScoreThreshold value of
AnomalyModel |
| Local outlier factor model | [0,∞) | ScoreThreshold value of
AnomalyModel |
| One-class support vector machine (SVM) model | (–∞,∞) | ScoreThreshold value of
AnomalyModel |
| Robust random cut forest model | [0,∞) | ScoreThreshold value of
AnomalyModel |
To specify an anomaly constraint threshold, you must specify an anomaly detection model by
using the AnomalyModel
name-value argument.
Example: MaxAnomalyScore=0.75
Data Types: single | double
Option to perform computations in parallel using a parallel pool of workers, specified as one of these values:
false(0) — Run in serial on the MATLAB® client.true(1) — Use a parallel pool if one is open or if MATLAB can automatically create one. If a parallel pool is not available, run in serial on the MATLAB client.
If you do not have a parallel pool open and automatic pool creation is enabled, MATLAB opens a pool using the default cluster profile. To use a parallel pool to run computations in MATLAB, you must have Parallel Computing Toolbox™. For more information, see Run MATLAB Functions with Automatic Parallel Support (Parallel Computing Toolbox).
Example: UseParallel=true
Data Types: single | double | logical
Predicted class score threshold, specified as a numeric scalar. The software uses the
threshold for the class k to
which observation does not
belong. The software ensures that counterfactual
examples have predicted scores for class
k that are above the threshold.
If
Mdlreturns predicted scores that are posterior probabilities, then the value ofMinLearnerScoremust be in the range (0.5,1]. The default threshold is 0.5.If
Mdlreturns predicted scores that are not posterior probabilities, then the value ofMinLearnerScoremust be greater than 0. The default threshold is 0.
Example: MinLearnerScore=0.6
Data Types: single | double
Output Arguments
Counterfactual examples, returned as a numeric matrix or a table. Each row corresponds to an
observation, and each column corresponds to a
predictor variable. The order of the observations
depends on the distance between the counterfactual
examples and observation,
with the closest counterfactual example appearing
in the first row. For more information, see Counterfactual Examples and Bayesian Optimization.
Diagnostics for the counterfactual examples, returned as a table. Each row corresponds to a
counterfactual example (row in counterExamples), and each column
corresponds to a metric.
| Column | Description |
|---|---|
Distance | Distance between a counterfactual example and the original observation
( |
NumModifiedPredictors | Number of predictor values in a counterfactual example that differ from the original observation |
LearnerScore | Confidence of the model (Mdl) that a counterfactual example has its
predicted class label |
AnomalyScore | Measure of how atypical a counterfactual example is relative to the training data
(Mdl.X or
X) |
Limitations
Ordered categorical predictors are not supported.
Algorithms
The counterfactuals function uses Bayesian optimization to find and
return counterfactual examples. The function selects counterfactual candidates
z that are a minimal distance away from the original observation
x (observation), have a different predicted label
from x, have few predictor values that differ from x,
and (optionally) are not anomalies. That is, the function finds z such
that is optimized with the following constraints:
, where is the predicted class label for z and is the predicted class label for x.
, where ||·||0 is the L0 norm and p is the maximum number of predictors that can be changed (
MaxNumModifiablePredictors)., where a(z) is the anomaly score for z and amax is the anomaly constraint threshold (
MaxAnomalyScore).
The distance depends on the mix of modifiable predictors
(ModifiablePredictors). The distance is the fast standardized
Euclidean distance if all the modifiable predictors are continuous (numeric), the Hamming
distance if all the modifiable predictors are categorical, and the modified Goodall distance
if the modifiable predictors are a mix of continuous and categorical variables. For more
information, see Distance Metrics.
The following image illustrates the position of a counterfactual example in relation to the original observation.

The original observation belongs to class 0, and observations below the decision boundary are predicted as belonging to class 0. The software tries to find the closest observation beyond the decision boundary that meets the required constraints.
The counterfactuals function uses the bayesopt
function to find and evaluate counterfactual
candidates. The software displays the optimization
results one iteration at a time. At each iteration,
the software finds a candidate and computes the
distance between it and the original observation
(observation).
The display shows the candidate's predictor values as separate columns and the distance value in the
Objectivecolumn. TheEval resultcolumn indicates whether a candidate violates the optimization constraints. If a candidate has a positiveConstraint1 violationvalue, the software rejects the candidate. That is, the observation is infeasible.The plot shows in blue the shortest distance computed so far for counterfactual candidates that satisfy the optimization constraints. The green line corresponds to an estimate of the shortest distance, which often matches the shortest computed distance.
After 50 iterations, the Bayesian optimization process ends. The software selects counterfactual examples from the list of feasible candidates, in order of shortest to longest distance from the original observation.
Extended Capabilities
To run in parallel, set the UseParallel name-value argument to
true in the call to this function.
For more general information about parallel computing, see Run MATLAB Functions with Automatic Parallel Support (Parallel Computing Toolbox).
Version History
Introduced in R2026a
MATLAB Command
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
选择网站
选择网站以获取翻译的可用内容,以及查看当地活动和优惠。根据您的位置,我们建议您选择:。
您也可以从以下列表中选择网站:
如何获得最佳网站性能
选择中国网站(中文或英文)以获得最佳网站性能。其他 MathWorks 国家/地区网站并未针对您所在位置的访问进行优化。
美洲
- América Latina (Español)
- Canada (English)
- United States (English)
欧洲
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)