Compare Lifetime PD Models Using Cross-Validation
This example shows how to compare three lifetime PD models using cross-validation.
Load Data
Load the portfolio data, which includes load and macro information. This is a simulated data set used for illustration purposes.
load RetailCreditPanelData.mat
data = join(data,dataMacro);
disp(head(data))
ID ScoreGroup YOB Default Year GDP Market __ __________ ___ _______ ____ _____ ______ 1 Low Risk 1 0 1997 2.72 7.61 1 Low Risk 2 0 1998 3.57 26.24 1 Low Risk 3 0 1999 2.86 18.1 1 Low Risk 4 0 2000 2.43 3.19 1 Low Risk 5 0 2001 1.26 -10.51 1 Low Risk 6 0 2002 -0.59 -22.95 1 Low Risk 7 0 2003 0.63 2.78 1 Low Risk 8 0 2004 1.85 9.48
Cross Validation
Because the data is panel data, there are multiple rows for each customer. You set up cross validation partitions over the customer IDs, not over the rows of the data set. In this way, a customer can be in either a training set or a test set, but the rows corresponding to the same customer are not split between training and testing.
nIDs = max(data.ID); uniqueIDs = unique(data.ID); NumFolds = 5; rng('default'); % for reproducibility c = cvpartition(nIDs,'KFold',NumFolds);
Compare Logistic
, Probit
, Cox
lifetime PD models using the same variables.
CVModels = ["logistic";"probit";"cox"]; NumModels = length(CVModels); AUROC = zeros(NumFolds,NumModels); RMSE = zeros(NumFolds,NumModels); for ii=1:NumFolds fprintf('Fitting models, fold %d\n',ii); % Get indices for ID partition TrainIDInd = training(c,ii); TestIDInd = test(c,ii); % Convert to row indices TrainDataInd = ismember(data.ID,uniqueIDs(TrainIDInd)); TestDataInd = ismember(data.ID,uniqueIDs(TestIDInd)); % For each model, fit with training data, measure with test data for jj=1:NumModels % Fit model with training data pdModel = fitLifetimePDModel(data(TrainDataInd,:),CVModels(jj),... 'IDVar','ID','AgeVar','YOB','LoanVars','ScoreGroup',... 'MacroVars',{'GDP','Market'},'ResponseVar','Default'); % Measure discrimination on test data DiscMeasure = modelDiscrimination(pdModel,data(TestDataInd,:)); AUROC(ii,jj) = DiscMeasure.AUROC; % Measure calibration on test data, grouping by YOB (age) and score group CalMeasure = modelCalibration(pdModel,data(TestDataInd,:),["YOB" "ScoreGroup"]); RMSE(ii,jj) = CalMeasure.RMSE; end end
Fitting models, fold 1 Fitting models, fold 2 Fitting models, fold 3 Fitting models, fold 4 Fitting models, fold 5
Using the discrimination and accuracy measures for the different folds, you can compare the models. In this example, the metrics are displayed. You can also compare the mean AUROC or the mean RMSE by comparing the proportion of times a model is superior regarding discrimination or accuracy. The three models in this example are very comparable.
AUROCTable = array2table(AUROC,"RowNames",strcat("Fold ",string(1:NumFolds)),"VariableNames",strcat("AUROC_",CVModels))
AUROCTable=5×3 table
AUROC_logistic AUROC_probit AUROC_cox
______________ ____________ _________
Fold 1 0.69558 0.6957 0.69565
Fold 2 0.70265 0.70335 0.70366
Fold 3 0.69055 0.69037 0.69008
Fold 4 0.70268 0.70232 0.70296
Fold 5 0.68784 0.68781 0.68811
RMSETable = array2table(RMSE,"RowNames",strcat("Fold ",string(1:NumFolds)),"VariableNames",strcat("RMSE_",CVModels))
RMSETable=5×3 table
RMSE_logistic RMSE_probit RMSE_cox
_____________ ___________ __________
Fold 1 0.0019412 0.0020972 0.0020048
Fold 2 0.0011167 0.0011644 0.0011612
Fold 3 0.0011536 0.0011802 0.0012766
Fold 4 0.0010269 0.00097877 0.00099473
Fold 5 0.0015965 0.001485 0.0015829
See Also
fitLifetimePDModel
| predict
| predictLifetime
| modelDiscrimination
| modelCalibration
| modelCalibrationPlot
| Logistic
| Probit
| Cox