Bias Mitigation in Credit Scoring by Disparate Impact Removal
Disparate impact removal is a pre-processing technique in bias mitigation. Using this technique, you modify the original credit score data to increase group fairness, while still preserving rank ordering within groups. Using a disparate impact removal technique reduces the bias introduced by the credit scoring model more than if you use the original data to train the credit scoring model. You perform the disparate impact removal technique using the disparateImpactRemover
class from the Statistics and Machine Learning Toolbox™. This class returns a remover
object along with a table containing the new predictor values. However, you need to use the transform
method with the remover
object on the test data before you can predict using the fitted credit scoring model.
The disparate impact removal technique works only with the distribution of data within a numeric predictor for each subgroup of a sensitive attribute. The disparateImpactRemover
class has no knowledge of, or relation to, the response variable. In this example, you treat all the numeric predictors, time at address (TmAtAddress
), customer income (CustIncome
), time with Bank (TmWBank
), average monthly balance (AMBalance
), and utilization rate (UtilRate
), with respect to the sensitive attribute, customer age (AgeGroup
).
Original Credit Scoring Model
This example uses a credit scoring workflow. Load the CreditCardData.mat
and use the 'data'
data set.
load CreditCardData.mat AgeGroup = discretize(data.CustAge,[min(data.CustAge) 30 45 60 max(data.CustAge)], ... 'categorical',{'Age < 30','30 <= Age < 45','45 <= Age < 60','Age >= 60'}); data = addvars(data,AgeGroup,'After','CustAge'); head(data)
CustID CustAge AgeGroup TmAtAddress ResStatus EmpStatus CustIncome TmWBank OtherCC AMBalance UtilRate status ______ _______ ______________ ___________ __________ _________ __________ _______ _______ _________ ________ ______ 1 53 45 <= Age < 60 62 Tenant Unknown 50000 55 Yes 1055.9 0.22 0 2 61 Age >= 60 22 Home Owner Employed 52000 25 Yes 1161.6 0.24 0 3 47 45 <= Age < 60 30 Tenant Employed 37000 61 No 877.23 0.29 0 4 50 45 <= Age < 60 75 Home Owner Employed 53000 20 Yes 157.37 0.08 0 5 68 Age >= 60 56 Home Owner Employed 53000 14 Yes 561.84 0.11 0 6 65 Age >= 60 13 Home Owner Employed 48000 59 Yes 968.18 0.15 0 7 34 30 <= Age < 45 32 Home Owner Unknown 32000 26 Yes 717.82 0.02 1 8 50 45 <= Age < 60 57 Other Employed 51000 33 No 3041.2 0.13 0
rng('default')
Split the data set into training and testing data.
c = cvpartition(size(data,1),'HoldOut',0.3);
data_Train = data(c.training(),:);
data_Test = data(c.test(),:);
head(data_Train)
CustID CustAge AgeGroup TmAtAddress ResStatus EmpStatus CustIncome TmWBank OtherCC AMBalance UtilRate status ______ _______ ______________ ___________ __________ _________ __________ _______ _______ _________ ________ ______ 1 53 45 <= Age < 60 62 Tenant Unknown 50000 55 Yes 1055.9 0.22 0 2 61 Age >= 60 22 Home Owner Employed 52000 25 Yes 1161.6 0.24 0 3 47 45 <= Age < 60 30 Tenant Employed 37000 61 No 877.23 0.29 0 4 50 45 <= Age < 60 75 Home Owner Employed 53000 20 Yes 157.37 0.08 0 7 34 30 <= Age < 45 32 Home Owner Unknown 32000 26 Yes 717.82 0.02 1 8 50 45 <= Age < 60 57 Other Employed 51000 33 No 3041.2 0.13 0 9 50 45 <= Age < 60 10 Tenant Unknown 52000 25 Yes 115.56 0.02 1 10 49 45 <= Age < 60 30 Home Owner Unknown 53000 23 Yes 718.5 0.17 1
Use creditscorecard
to create a creditscorecard
object and use fitmodel
to fit a credit scoring model with the training data (data_Train
).
PredictorVars = setdiff(data_Train.Properties.VariableNames, ... {'CustAge','AgeGroup','CustID','status'}); sc1 = creditscorecard(data_Train,'IDVar','CustID', ... 'PredictorVars',PredictorVars,'ResponseVar','status'); sc1 = autobinning(sc1); sc1 = fitmodel(sc1,'VariableSelection','fullmodel');
Generalized linear regression model: logit(status) ~ 1 + TmAtAddress + ResStatus + EmpStatus + CustIncome + TmWBank + OtherCC + AMBalance + UtilRate Distribution = Binomial Estimated Coefficients: Estimate SE tStat pValue ________ ________ ________ __________ (Intercept) 0.73924 0.077237 9.5711 1.058e-21 TmAtAddress 1.2577 0.99118 1.2689 0.20448 ResStatus 1.755 1.295 1.3552 0.17535 EmpStatus 0.88652 0.32232 2.7504 0.0059516 CustIncome 0.95991 0.19645 4.8862 1.0281e-06 TmWBank 1.132 0.3157 3.5856 0.00033637 OtherCC 0.85227 2.1198 0.40204 0.68765 AMBalance 1.0773 0.31969 3.3698 0.00075232 UtilRate -0.19784 0.59565 -0.33214 0.73978 840 observations, 831 error degrees of freedom Dispersion: 1 Chi^2-statistic vs. constant model: 66.5, p-value = 2.44e-11
Use displaypoints
to compute the points per predictor per bin for the creditscorecard
model (sc1
).
pointsinfo1 = displaypoints(sc1)
pointsinfo1=38×3 table
Predictors Bin Points
_______________ _________________ _________
{'TmAtAddress'} {'[-Inf,9)' } -0.17538
{'TmAtAddress'} {'[9,16)' } 0.05434
{'TmAtAddress'} {'[16,23)' } 0.096897
{'TmAtAddress'} {'[23,Inf]' } 0.13984
{'TmAtAddress'} {'<missing>' } NaN
{'ResStatus' } {'Tenant' } -0.017688
{'ResStatus' } {'Home Owner' } 0.11681
{'ResStatus' } {'Other' } 0.29011
{'ResStatus' } {'<missing>' } NaN
{'EmpStatus' } {'Unknown' } -0.097582
{'EmpStatus' } {'Employed' } 0.33162
{'EmpStatus' } {'<missing>' } NaN
{'CustIncome' } {'[-Inf,30000)' } -0.61962
{'CustIncome' } {'[30000,36000)'} -0.10695
{'CustIncome' } {'[36000,40000)'} 0.0010845
{'CustIncome' } {'[40000,42000)'} 0.065532
⋮
Use probdefault
to determine the likelihood of default for the data_Test
data set and the creditscorecard
model (sc1
).
pd1 = probdefault(sc1,data_Test); threshold = 0.35; predictions1 = double(pd1>threshold);
Use fairnessMetrics
to compute fairness metrics at the model level as a baseline. Use report
to generate the fairness metrics report.
modelMetrics1 = fairnessMetrics(data_Test,'status','Predictions',predictions1,'SensitiveAttributeNames','AgeGroup'); mmReport1 = report(modelMetrics1,'GroupMetrics','GroupCount')
mmReport1=4×8 table
ModelNames SensitiveAttributeNames Groups StatisticalParityDifference DisparateImpact EqualOpportunityDifference AverageAbsoluteOddsDifference GroupCount
__________ _______________________ ______________ ___________________________ _______________ __________________________ _____________________________ __________
Model1 AgeGroup Age < 30 0.54312 2.6945 0.47391 0.5362 22
Model1 AgeGroup 30 <= Age < 45 0.19922 1.6216 0.35645 0.22138 152
Model1 AgeGroup 45 <= Age < 60 0 1 0 0 156
Model1 AgeGroup Age >= 60 -0.15385 0.52 -0.18323 0.16375 30
Use plot
to visualize the statistical parity difference ('spd'
) and disparate impact ('di'
) bias metrics.
figure tiledlayout(2,1) nexttile plot(modelMetrics1,'spd') nexttile plot(modelMetrics1,'di')
Bias Mitigation by Disparate Impact Removal
For each of the five continuous predictors, 'TmAtAddress'
, 'CustIncome'
, 'TmWBank'
, 'AMBalance'
, and 'UtilRate'
plot the original distributions of data within each age group.
Choose a numeric predictor to plot.
predictor = "CustIncome"; [f1, xi1] = ksdensity(data_Train.(predictor)(data_Train.AgeGroup=='Age < 30')); [f2, xi2] = ksdensity(data_Train.(predictor)(data_Train.AgeGroup=='30 <= Age < 45')); [f3, xi3] = ksdensity(data_Train.(predictor)(data_Train.AgeGroup=='45 <= Age < 60')); [f4, xi4] = ksdensity(data_Train.(predictor)(data_Train.AgeGroup=='Age >= 60'));
Create a disparateImpactRemover
object and return the newTrainTbl
table with the new predictor values.
[remover, newTrainTbl] = disparateImpactRemover(data_Train, 'AgeGroup' , 'PredictorNames', {'TmAtAddress','CustIncome','TmWBank','AMBalance','UtilRate'})
remover = disparateImpactRemover with properties: RepairFraction: 1 PredictorNames: {'TmAtAddress' 'CustIncome' 'TmWBank' 'AMBalance' 'UtilRate'} SensitiveAttribute: 'AgeGroup'
newTrainTbl=840×12 table
CustID CustAge AgeGroup TmAtAddress ResStatus EmpStatus CustIncome TmWBank OtherCC AMBalance UtilRate status
______ _______ ______________ ___________ __________ _________ __________ _______ _______ _________ ________ ______
1 53 45 <= Age < 60 58.599 Tenant Unknown 47000 51.733 Yes 1009.4 0.20099 0
2 61 Age >= 60 24 Home Owner Employed 41500 24.5 Yes 1203.9 0.25 0
3 47 45 <= Age < 60 30.5 Tenant Employed 33500 57.686 No 817.9 0.29 0
4 50 45 <= Age < 60 68.622 Home Owner Employed 49401 19.07 Yes 120.54 0.077267 0
7 34 30 <= Age < 45 30.5 Home Owner Unknown 35500 26.657 Yes 638.88 0.02 1
8 50 45 <= Age < 60 53.39 Other Employed 47000 27.971 No 2172.7 0.12 0
9 50 45 <= Age < 60 9 Tenant Unknown 49401 22.541 Yes 120.54 0.02 1
10 49 45 <= Age < 60 30.5 Home Owner Unknown 49401 22 Yes 664.51 0.14715 1
11 52 45 <= Age < 60 24 Tenant Unknown 30500 38.779 Yes 120.54 0.065 1
12 48 45 <= Age < 60 77.291 Other Unknown 40500 14.5 Yes 405.81 0.03 0
14 44 30 <= Age < 45 68.622 Other Unknown 44500 34.791 No 378.88 0.15657 0
17 39 30 <= Age < 45 9 Tenant Employed 37500 38.779 Yes 664.51 0.25 1
20 52 45 <= Age < 60 51.442 Other Unknown 38500 12.297 Yes 1157.5 0.19273 0
21 37 30 <= Age < 45 10.343 Tenant Unknown 36500 23.314 No 732.28 0.065 1
22 51 45 <= Age < 60 12.087 Home Owner Employed 31500 27.971 Yes 437.95 0.01 0
24 43 30 <= Age < 45 40 Tenant Employed 33500 11.18 Yes 263.13 0.077267 0
⋮
[nf1, nxi1] = ksdensity(newTrainTbl.(predictor)(newTrainTbl.AgeGroup=='Age < 30')); [nf2, nxi2] = ksdensity(newTrainTbl.(predictor)(newTrainTbl.AgeGroup=='30 <= Age < 45')); [nf3, nxi3] = ksdensity(newTrainTbl.(predictor)(newTrainTbl.AgeGroup=='45 <= Age < 60')); [nf4, nxi4] = ksdensity(newTrainTbl.(predictor)(newTrainTbl.AgeGroup=='Age >= 60'));
Plot the original and the repaired distributions.
figure; tiledlayout(2,1) ax1 = nexttile; plot(xi1, f1, 'LineWidth', 1.5) hold on plot(xi2, f2, 'LineWidth', 1.5) plot(xi3, f3, 'LineWidth', 1.5) plot(xi4, f4, 'LineWidth', 1.5) legend(["Age < 30"; "30 <= Age < 45"; "45 <= Age < 60"; "Age >= 60"],'Location','northwest') ax1.Title.String = "Original Distribution of " + predictor; xlabel(predictor) ylabel('pdf') grid on ax2 = nexttile; plot(nxi1, nf1, 'LineWidth', 1.5) hold on plot(nxi2, nf2, 'LineWidth', 1.5) plot(nxi3, nf3, 'LineWidth', 1.5) plot(nxi4, nf4, 'LineWidth', 1.5) legend(["Age < 30"; "30 <= Age < 45"; "45 <= Age < 60"; "Age >= 60"],'Location','northwest') ax2.Title.String = "Repaired Distribution of " + predictor; xlabel(predictor) ylabel('pdf') grid on linkaxes([ax1, ax2], 'xy')
This plot demonstrates that the initial distributions of CustIncome
of each group within the AgeGroup
predictor are different. Younger people seem to have a lower income on average and more variation than older people. This difference introduces bias, which the fitted model then reflects. The disparateImpactRemover
function modifies the data so that the distributions of all the subgroups are more similar. You see this distribution in the second subplot Repaired Distribution of CustIncome. Using this new data, you can fit a logistic regression model and then measure the model-level metrics and compare these with the model-level metrics from the original creditscorecard
model (sc1
).
New Credit Scoring Model
Use creditscorecard
to create a creditscorecard
object and use fitmodel
to fit a credit scoring model with the new data (newTrainTbl
). Then, you can compute model-level bias metrics using fairnessMetrics
.
sc2 = creditscorecard(newTrainTbl,'IDVar','CustID', ... 'PredictorVars',PredictorVars,'ResponseVar','status'); sc2 = autobinning(sc2); sc2 = fitmodel(sc2,'VariableSelection','fullmodel');
Generalized linear regression model: logit(status) ~ 1 + TmAtAddress + ResStatus + EmpStatus + CustIncome + TmWBank + OtherCC + AMBalance + UtilRate Distribution = Binomial Estimated Coefficients: Estimate SE tStat pValue _________ _______ _______ __________ (Intercept) 0.74041 0.07641 9.6899 3.327e-22 TmAtAddress 1.1658 0.87564 1.3313 0.18308 ResStatus 1.8719 1.2848 1.4569 0.14513 EmpStatus 0.88699 0.31991 2.7727 0.00556 CustIncome 0.98269 0.28725 3.421 0.00062396 TmWBank 1.1392 0.30677 3.7135 0.00020442 OtherCC 0.55005 2.0969 0.26231 0.79308 AMBalance 1.0478 0.3607 2.9049 0.0036734 UtilRate -0.071972 0.58704 -0.1226 0.90242 840 observations, 831 error degrees of freedom Dispersion: 1 Chi^2-statistic vs. constant model: 50.4, p-value = 3.36e-08
Use displaypoints
to compute the points per predictor per bin for the creditscorecard
model (sc2
).
pointsinfo2 = displaypoints(sc2)
pointsinfo2=35×3 table
Predictors Bin Points
_______________ _________________ _________
{'TmAtAddress'} {'[-Inf,9)' } -0.11003
{'TmAtAddress'} {'[9,15.1453)' } -0.091424
{'TmAtAddress'} {'[15.1453,Inf]'} 0.14546
{'TmAtAddress'} {'<missing>' } NaN
{'ResStatus' } {'Tenant' } -0.024878
{'ResStatus' } {'Home Owner' } 0.11858
{'ResStatus' } {'Other' } 0.30343
{'ResStatus' } {'<missing>' } NaN
{'EmpStatus' } {'Unknown' } -0.097539
{'EmpStatus' } {'Employed' } 0.3319
{'EmpStatus' } {'<missing>' } NaN
{'CustIncome' } {'[-Inf,31500)' } -0.30942
{'CustIncome' } {'[31500,38500)'} -0.09789
{'CustIncome' } {'[38500,45000)'} 0.21233
{'CustIncome' } {'[45000,Inf]' } 0.494
{'CustIncome' } {'<missing>' } NaN
⋮
Before computing probabilities of default with the test data, you need to transform the test data using the same transformation as for the training data. To make this transformation, use the transform
method of the remover
object and pass it in the data_Test
data set. Then, use probdefault
to compute the likelihood of default of the data_Test
data set.
newTestTbl = transform(remover,data_Test); pd2 = probdefault(sc2,newTestTbl); predictions2 = double(pd2>threshold);
Use fairnessMetrics
to compute fairness metrics at the model level and use report
to generate a fairness metrics report.
modelMetrics2 = fairnessMetrics(newTestTbl,'status','Predictions',predictions2,'SensitiveAttributeNames','AgeGroup'); mmReport2 = report(modelMetrics2,'GroupMetrics','GroupCount')
mmReport2=4×8 table
ModelNames SensitiveAttributeNames Groups StatisticalParityDifference DisparateImpact EqualOpportunityDifference AverageAbsoluteOddsDifference GroupCount
__________ _______________________ ______________ ___________________________ _______________ __________________________ _____________________________ __________
Model1 AgeGroup Age < 30 0.082751 1.2226 0.18696 0.10408 22
Model1 AgeGroup 30 <= Age < 45 -0.0033738 0.99093 0.07902 0.076333 152
Model1 AgeGroup 45 <= Age < 60 0 1 0 0 156
Model1 AgeGroup Age >= 60 0.028205 1.0759 0.015528 0.026143 30
Use plot
to visualize the statistical parity difference ('spd'
) and disparate impact ('di'
) bias metrics.
figure tiledlayout(2,1) nexttile plot(modelMetrics2,'spd') nexttile plot(modelMetrics2,'di')
Plot Disparate Impact and Accuracy for Different Repair Fraction Values
In this example, the bias mitigation process uses disparateImpactRemover
to set RepairFraction
= 1
in order to mitigate bias. However, it is useful to see how the disparate impact and accuracy varies with a change in the RepairFraction
value. For example, use the AgeGroup
predictor and plot the disparate impact and accuracy of the different subgroups for different values of RepairFraction
.
subgroup = 4; r = 0:0.1:1; Accuracy = zeros(11,1); di = zeros(11,1); for i = 0:1:10 [rmvr, trainTbl] = disparateImpactRemover(data_Train, 'AgeGroup' , ... 'PredictorNames', {'TmAtAddress','CustIncome','TmWBank','AMBalance','UtilRate'},'RepairFraction',i/10); testTbl = transform(rmvr, data_Test); sc = creditscorecard(trainTbl,'IDVar','CustID', ... 'PredictorVars',PredictorVars,'ResponseVar','status'); sc = autobinning(sc); sc = fitmodel(sc,'VariableSelection','fullmodel','Display','off'); pd = probdefault(sc,testTbl); predictions = double(pd>threshold); modelMetrics = fairnessMetrics(newTestTbl, 'status', 'Predictions', predictions, 'SensitiveAttributeNames','AgeGroup'); mmReport = report(modelMetrics,'BiasMetrics','di','GroupMetrics','accuracy'); di(i+1) = mmReport.DisparateImpact(subgroup); Accuracy(i+1) = mmReport.Accuracy(subgroup); end figure yyaxis left plot(r, di,'LineWidth', 1.5) title('Bias Mitigation in AgeGroup ') xlabel('Repair Fraction') ylabel('Disparate Impact') yyaxis right plot(r, Accuracy,'LineWidth', 1.5) ylabel('Accuracy') grid on
If you select the subgroup 'Age < 30'
from this plot, you can see that the accuracy increases as the RepairFraction
value increases. Although this seems counterintuitive, looking further at the GroupCount
of that age group in the mmReport2
table, this group has only 22 observations. This small number of observations explains the anomaly in this plot.
One way to mitigate this issue of not having enough data for a subgroup is to combine all unprivileged groups and compare them as one group against the privileged group. The following code shows you how by setting the majority group (45 <= Age < 60
) as the privileged group and then by combining every other group into one and setting that group as the unprivileged group.
privilegedGroup = '45 <= Age < 60'; twoAgeGroups_TrainTbl = data_Train; twoAgeGroups_TrainTbl.AgeGroup = addcats(twoAgeGroups_TrainTbl.AgeGroup,'Other','After','Age >= 60'); twoAgeGroups_TrainTbl.AgeGroup(twoAgeGroups_TrainTbl.AgeGroup ~= privilegedGroup) = 'Other'; twoAgeGroups_TestTbl = data_Test; twoAgeGroups_TestTbl.AgeGroup = addcats(twoAgeGroups_TestTbl.AgeGroup,'Other','After','Age >= 60'); twoAgeGroups_TestTbl.AgeGroup(twoAgeGroups_TestTbl.AgeGroup ~= privilegedGroup) = 'Other'; r = 0:0.1:1; Accuracy = zeros(11,1); di = zeros(11,1); for i = 0:1:10 [rmvr, trainTbl] = disparateImpactRemover(twoAgeGroups_TrainTbl, 'AgeGroup' , ... 'PredictorNames', {'TmAtAddress','CustIncome','TmWBank','AMBalance','UtilRate'},'RepairFraction',i/10); testTbl = transform(rmvr, twoAgeGroups_TestTbl); sc = creditscorecard(trainTbl,'IDVar','CustID', ... 'PredictorVars',PredictorVars,'ResponseVar','status'); sc = autobinning(sc); sc = fitmodel(sc,'VariableSelection','fullmodel','Display','off'); pd = probdefault(sc,testTbl); predictions = double(pd>threshold); modelMetrics = fairnessMetrics(twoAgeGroups_TestTbl, 'status', 'Predictions', predictions, ... 'SensitiveAttributeNames','AgeGroup','ReferenceGroup','45 <= Age < 60'); mmReport = report(modelMetrics,'BiasMetrics','di','GroupMetrics','accuracy'); di(i+1) = mmReport.DisparateImpact(2); Accuracy(i+1) = mmReport.Accuracy(2); end figure yyaxis left plot(r, di,'LineWidth', 1.5) title('Bias Mitigation in AgeGroup ') xlabel('Repair Fraction') ylabel('Disparate Impact') yyaxis right plot(r, Accuracy,'LineWidth', 1.5) ylabel('Accuracy') grid on
You can use this privileged group and unprivileged group method if the goal is not to measure the bias of each individual group against the privileged group, but rather to measure the overall fairness of all customers who are not part of the privileged group.
References
[1] Nielsen, Aileen. "Chapter 4. Fairness PreProcessing." Practical Fairness. O'Reilly Media, Inc., Dec. 2020.
[2] Mehrabi, Ninareh, et al. “A Survey on Bias and Fairness in Machine Learning.” ArXiv:1908.09635 [Cs], Sept. 2019. arXiv.org, https://arxiv.org/abs/1908.09635.
[3] Wachter, Sandra, et al. Bias Preservation in Machine Learning: The Legality of Fairness Metrics Under EU Non-Discrimination Law. SSRN Scholarly Paper, ID 3792772, Social Science Research Network, 15 Jan. 2021. papers.ssrn.com, https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3792772.
See Also
creditscorecard
| autobinning
| fitmodel
| displaypoints
| probdefault