setmodel
Set model predictors and coefficients
Description
sets the predictors and coefficients of a linear logistic regression model
fitted outside the sc
= setmodel(sc
,ModelPredictors
,ModelCoefficients
)creditscorecard
object and returns an
updated creditscorecard
object. The predictors and
coefficients are used for the computation of scorecard points. Use
setmodel
in lieu of fitmodel
, which fits a
linear logistic regression model, because setmodel
offers
increased flexibility. For example, when a model fitted with fitmodel
needs to be
modified, you can use setmodel
. For more information, see
Workflows for Using setmodel.
Note
When using setmodel
, the following assumptions apply:
The model coefficients correspond to a linear logistic regression model (where only linear terms are included in the model and there are no interactions or any other higher-order terms).
The model was previously fitted using Weight of Evidence (WOE) data with the response mapped so that ‘Good’ is
1
and ‘Bad’ is0
.
Examples
Modify a GLM Model Fitted with fitmodel
This example shows how to use setmodel
to make modifications to a logistic regression model initially fitted using the fitmodel
function, and then set the new logistic regression model predictors and coefficients back into the creditscorecard
object.
Create a creditscorecard
object using the CreditCardData.mat
file to load the data
(using a dataset from Refaat 2011).
load CreditCardData sc = creditscorecard(data,'IDVar','CustID')
sc = creditscorecard with properties: GoodLabel: 0 ResponseVar: 'status' WeightsVar: '' VarNames: {'CustID' 'CustAge' 'TmAtAddress' 'ResStatus' 'EmpStatus' 'CustIncome' 'TmWBank' 'OtherCC' 'AMBalance' 'UtilRate' 'status'} NumericPredictors: {'CustAge' 'TmAtAddress' 'CustIncome' 'TmWBank' 'AMBalance' 'UtilRate'} CategoricalPredictors: {'ResStatus' 'EmpStatus' 'OtherCC'} BinMissingData: 0 IDVar: 'CustID' PredictorVars: {'CustAge' 'TmAtAddress' 'ResStatus' 'EmpStatus' 'CustIncome' 'TmWBank' 'OtherCC' 'AMBalance' 'UtilRate'} Data: [1200x11 table]
Perform automatic binning.
sc = autobinning(sc);
The standard workflow is to use the fitmodel
function to fit a logistic regression model using a stepwise method. However, fitmodel
only supports limited options regarding the stepwise procedure. You can use the optional mdl
output argument from fitmodel
to get a copy of the fitted GeneralizedLinearModel
object, to later modify.
[sc,mdl] = fitmodel(sc);
1. Adding CustIncome, Deviance = 1490.8527, Chi2Stat = 32.588614, PValue = 1.1387992e-08 2. Adding TmWBank, Deviance = 1467.1415, Chi2Stat = 23.711203, PValue = 1.1192909e-06 3. Adding AMBalance, Deviance = 1455.5715, Chi2Stat = 11.569967, PValue = 0.00067025601 4. Adding EmpStatus, Deviance = 1447.3451, Chi2Stat = 8.2264038, PValue = 0.0041285257 5. Adding CustAge, Deviance = 1441.994, Chi2Stat = 5.3511754, PValue = 0.020708306 6. Adding ResStatus, Deviance = 1437.8756, Chi2Stat = 4.118404, PValue = 0.042419078 7. Adding OtherCC, Deviance = 1433.707, Chi2Stat = 4.1686018, PValue = 0.041179769 Generalized linear regression model: logit(status) ~ 1 + CustAge + ResStatus + EmpStatus + CustIncome + TmWBank + OtherCC + AMBalance Distribution = Binomial Estimated Coefficients: Estimate SE tStat pValue ________ ________ ______ __________ (Intercept) 0.70239 0.064001 10.975 5.0538e-28 CustAge 0.60833 0.24932 2.44 0.014687 ResStatus 1.377 0.65272 2.1097 0.034888 EmpStatus 0.88565 0.293 3.0227 0.0025055 CustIncome 0.70164 0.21844 3.2121 0.0013179 TmWBank 1.1074 0.23271 4.7589 1.9464e-06 OtherCC 1.0883 0.52912 2.0569 0.039696 AMBalance 1.045 0.32214 3.2439 0.0011792 1200 observations, 1192 error degrees of freedom Dispersion: 1 Chi^2-statistic vs. constant model: 89.7, p-value = 1.4e-16
Suppose you want to include, or "force," the predictor 'UtilRate'
in the logistic regression model, even though the stepwise method did not include it in the fitted model. You can add 'UtilRate'
to the logistic regression model using the GeneralizedLinearModel
object mdl
directly.
mdl = mdl.addTerms('UtilRate')
mdl = Generalized linear regression model: logit(status) ~ 1 + CustAge + ResStatus + EmpStatus + CustIncome + TmWBank + OtherCC + AMBalance + UtilRate Distribution = Binomial Estimated Coefficients: Estimate SE tStat pValue ________ ________ ________ __________ (Intercept) 0.70239 0.064001 10.975 5.0538e-28 CustAge 0.60843 0.24936 2.44 0.014687 ResStatus 1.3773 0.6529 2.1096 0.034896 EmpStatus 0.88556 0.29303 3.0221 0.0025103 CustIncome 0.70146 0.2186 3.2089 0.0013324 TmWBank 1.1071 0.23307 4.7503 2.0316e-06 OtherCC 1.0882 0.52918 2.0563 0.03975 AMBalance 1.0413 0.36557 2.8483 0.004395 UtilRate 0.013157 0.60864 0.021618 0.98275 1200 observations, 1191 error degrees of freedom Dispersion: 1 Chi^2-statistic vs. constant model: 89.7, p-value = 5.26e-16
Use setmodel
to update the model predictors and model coefficients in the creditscorecard
object. The ModelPredictors
input argument does not explicitly include a string for the intercept. However, the ModelCoefficients
input argument does have the intercept information as its first element.
ModelPredictors = mdl.PredictorNames
ModelPredictors = 8x1 cell
{'CustAge' }
{'ResStatus' }
{'EmpStatus' }
{'CustIncome'}
{'TmWBank' }
{'OtherCC' }
{'AMBalance' }
{'UtilRate' }
ModelCoefficients = mdl.Coefficients.Estimate
ModelCoefficients = 9×1
0.7024
0.6084
1.3773
0.8856
0.7015
1.1071
1.0882
1.0413
0.0132
sc = setmodel(sc,ModelPredictors,ModelCoefficients);
Verify that 'UtilRate'
is part of the scorecard predictors by displaying the scorecard points.
pi = displaypoints(sc)
pi=41×3 table
Predictors Bin Points
______________ ________________ _________
{'CustAge' } {'[-Inf,33)' } -0.17152
{'CustAge' } {'[33,37)' } -0.15295
{'CustAge' } {'[37,40)' } -0.072892
{'CustAge' } {'[40,46)' } 0.033856
{'CustAge' } {'[46,48)' } 0.20193
{'CustAge' } {'[48,58)' } 0.21787
{'CustAge' } {'[58,Inf]' } 0.46652
{'CustAge' } {'<missing>' } NaN
{'ResStatus' } {'Tenant' } -0.043826
{'ResStatus' } {'Home Owner' } 0.11442
{'ResStatus' } {'Other' } 0.36394
{'ResStatus' } {'<missing>' } NaN
{'EmpStatus' } {'Unknown' } -0.088843
{'EmpStatus' } {'Employed' } 0.30193
{'EmpStatus' } {'<missing>' } NaN
{'CustIncome'} {'[-Inf,29000)'} -0.46956
⋮
Fit a Logistic Regression Model Outside of the creditscorecard
Object
This example shows how to use setmodel
to fit a logistic regression model directly, without using the fitmodel
function, and then set the new model predictors and coefficients back into the creditscorecard
object. This approach gives more flexibility regarding options to control the stepwise procedure. This example fits a logistic regression model with a nondefault value for the 'PEnter'
parameter, the criterion to admit a new predictor in the logistic regression model during the stepwise procedure.
Create a creditscorecard
object using the CreditCardData.mat
file to load the data
(using a dataset from Refaat 2011). Use the 'IDVar'
argument to indicate that 'CustID'
contains ID information and should not be included as a predictor variable.
load CreditCardData sc = creditscorecard(data,'IDVar','CustID')
sc = creditscorecard with properties: GoodLabel: 0 ResponseVar: 'status' WeightsVar: '' VarNames: {'CustID' 'CustAge' 'TmAtAddress' 'ResStatus' 'EmpStatus' 'CustIncome' 'TmWBank' 'OtherCC' 'AMBalance' 'UtilRate' 'status'} NumericPredictors: {'CustAge' 'TmAtAddress' 'CustIncome' 'TmWBank' 'AMBalance' 'UtilRate'} CategoricalPredictors: {'ResStatus' 'EmpStatus' 'OtherCC'} BinMissingData: 0 IDVar: 'CustID' PredictorVars: {'CustAge' 'TmAtAddress' 'ResStatus' 'EmpStatus' 'CustIncome' 'TmWBank' 'OtherCC' 'AMBalance' 'UtilRate'} Data: [1200x11 table]
Perform automatic binning.
sc = autobinning(sc);
The logistic regression model needs to be fit with Weight of Evidence (WOE) data. The WOE transformation is a special case of binning, since the data first needs to be binned, and then the binned information is mapped to the corresponding WOE values. This transformation is done using the bindata
function. bindata
has an argument that prepares the data for the model fitting step. By setting the bindata
name-value pair argument for 'OutputType'
to WOEModelInput'
:
All predictors are converted to WOE values.
The output contains only predictors and response (no
'IDVar'
or any unused variables).Predictors with infinite or undefined (
NaN
) WOE values are discarded.The response values are mapped so that "Good" is
1
and "Bad" is0
(this implies that higher unscaled scores correspond to better, less risky customers).
bd = bindata(sc,'OutputType','WOEModelInput');
For example, the first ten rows in the original data for the variables 'CustAge'
, 'ResStatus'
, 'CustIncome'
, and 'status'
(response variable) look like this:
data(1:10,{'CustAge' 'ResStatus' 'CustIncome' 'status'})
ans=10×4 table
CustAge ResStatus CustIncome status
_______ __________ __________ ______
53 Tenant 50000 0
61 Home Owner 52000 0
47 Tenant 37000 0
50 Home Owner 53000 0
68 Home Owner 53000 0
65 Home Owner 48000 0
34 Home Owner 32000 1
50 Other 51000 0
50 Tenant 52000 1
49 Home Owner 53000 1
Here is how the same ten rows look after calling bindata
with the name-value pair argument 'OutputType'
set to 'WOEModelInput'
:
bd(1:10,{'CustAge' 'ResStatus' 'CustIncome' 'status'})
ans=10×4 table
CustAge ResStatus CustIncome status
________ _________ __________ ______
0.21378 -0.095564 0.47972 1
0.62245 0.019329 0.47972 1
0.18758 -0.095564 -0.026696 1
0.21378 0.019329 0.47972 1
0.62245 0.019329 0.47972 1
0.62245 0.019329 0.47972 1
-0.39568 0.019329 -0.29217 0
0.21378 0.20049 0.47972 1
0.21378 -0.095564 0.47972 0
0.21378 0.019329 0.47972 0
Fit a logistic linear regression model using a stepwise method with the Statistics and Machine Learning Toolbox™ function stepwiseglm
, but use a nondefault value for the 'PEnter'
and 'PRemove'
optional arguments. The predictors 'ResStatus'
and 'OtherCC'
would normally be included in the logistic linear regression model using default options for the stepwise procedure.
mdl = stepwiseglm(bd,'constant','Distribution','binomial',... 'Upper','linear','PEnter',0.025,'PRemove',0.05)
1. Adding CustIncome, Deviance = 1490.8527, Chi2Stat = 32.588614, PValue = 1.1387992e-08 2. Adding TmWBank, Deviance = 1467.1415, Chi2Stat = 23.711203, PValue = 1.1192909e-06 3. Adding AMBalance, Deviance = 1455.5715, Chi2Stat = 11.569967, PValue = 0.00067025601 4. Adding EmpStatus, Deviance = 1447.3451, Chi2Stat = 8.2264038, PValue = 0.0041285257 5. Adding CustAge, Deviance = 1441.994, Chi2Stat = 5.3511754, PValue = 0.020708306
mdl = Generalized linear regression model: logit(status) ~ 1 + CustAge + EmpStatus + CustIncome + TmWBank + AMBalance Distribution = Binomial Estimated Coefficients: Estimate SE tStat pValue ________ ________ ______ __________ (Intercept) 0.70263 0.063759 11.02 3.0544e-28 CustAge 0.57265 0.2482 2.3072 0.021043 EmpStatus 0.88356 0.29193 3.0266 0.002473 CustIncome 0.70399 0.21781 3.2321 0.001229 TmWBank 1.1 0.23185 4.7443 2.0924e-06 AMBalance 1.0313 0.32007 3.2221 0.0012724 1200 observations, 1194 error degrees of freedom Dispersion: 1 Chi^2-statistic vs. constant model: 81.4, p-value = 4.18e-16
Use setmodel
to update the model predictors and model coefficients in the creditscorecard
object. The ModelPredictors
input argument does not explicitly include a string for the intercept. However, the ModelCoefficients
input argument does have the intercept information as its first element.
ModelPredictors = mdl.PredictorNames
ModelPredictors = 5x1 cell
{'CustAge' }
{'EmpStatus' }
{'CustIncome'}
{'TmWBank' }
{'AMBalance' }
ModelCoefficients = mdl.Coefficients.Estimate
ModelCoefficients = 6×1
0.7026
0.5726
0.8836
0.7040
1.1000
1.0313
sc = setmodel(sc,ModelPredictors,ModelCoefficients);
Verify that the desired model predictors are part of the scorecard predictors by displaying the scorecard points.
pi = displaypoints(sc)
pi=30×3 table
Predictors Bin Points
______________ _________________ _________
{'CustAge' } {'[-Inf,33)' } -0.10354
{'CustAge' } {'[33,37)' } -0.086059
{'CustAge' } {'[37,40)' } -0.010713
{'CustAge' } {'[40,46)' } 0.089757
{'CustAge' } {'[46,48)' } 0.24794
{'CustAge' } {'[48,58)' } 0.26294
{'CustAge' } {'[58,Inf]' } 0.49697
{'CustAge' } {'<missing>' } NaN
{'EmpStatus' } {'Unknown' } -0.035716
{'EmpStatus' } {'Employed' } 0.35417
{'EmpStatus' } {'<missing>' } NaN
{'CustIncome'} {'[-Inf,29000)' } -0.41884
{'CustIncome'} {'[29000,33000)'} -0.065161
{'CustIncome'} {'[33000,35000)'} 0.092353
{'CustIncome'} {'[35000,40000)'} 0.12173
{'CustIncome'} {'[40000,42000)'} 0.13259
⋮
Input Arguments
sc
— Credit scorecard model
creditscorecard
object
Credit scorecard model, specified as a
creditscorecard
object. Use creditscorecard
to create
a creditscorecard
object.
ModelPredictors
— Predictor names included in fitted model
cell array of character vectors with predictor values
{'PredictorName1','PredictorName2',...}
Predictor names included in the fitted model, specified as a cell
array of character vectors as
{'PredictorName1','PredictorName2',...}
. The
predictor names must match predictor variable names in the
creditscorecard
object.
Note
Do not include a character vector for the constant term in
ModelPredictors
,
setmodel
internally handles the
'(Intercept)'
term based on the number of
model coefficients (see
ModelCoefficients
).
Data Types: cell
ModelCoefficients
— Model coefficients corresponding to model predictors
numeric array with values
[coeff1,coeff2,..]
Model coefficients corresponding to the model predictors, specified as
a numeric array of model coefficients,
[coeff1,coeff2,..]
. If N is
the number of predictor names provided in
ModelPredictors
, the size of
ModelCoefficients
can be N
or N+1. If ModelCoefficients
has
N+1 elements, then the first coefficient is used
as the '(Intercept)'
of the fitted model. Otherwise,
the '(Intercept)'
is set to
0
.
Data Types: double
Output Arguments
sc
— Credit scorecard model
creditscorecard
object
Credit scorecard model, returned as an updated
creditscorecard
object. The
creditscorecard
object contains information about
the model predictors and coefficients of the fitted model. For more
information on using the creditscorecard
object, see
creditscorecard
.
More About
Workflows for Using setmodel
When using setmodel
, there are two
possible workflows to set the final model predictors and model coefficients into
a creditscorecard
object.
The first workflow is:
Use
fitmodel
to get the optional output argumentmdl
. This is aGeneralizedLinearModel
object and you can add and remove terms, or modify the parameters of the stepwise procedure. Only linear terms can be in the model (no interactions or any other higher-order terms).Once the
GeneralizedLinearModel
object is satisfactory, set the final model predictors and model coefficients into thecreditscorecard
object using thesetmodel
input arguments forModelPredictors
andModelCoefficients
.
An alternate workflow is:
Obtain the Weight of Evidence (WOE) data using
bindata
. Use the'WOEModelInput'
option for the'OutputType'
name-value pair argument inbindata
to ensure that:The predictors data is transformed to WOE.
Only predictors whose bins have finite WOE values are included.
The response variable is placed in the last column.
The response variable is mapped (“Good” is
1
and “Bad” is0
).
Use the data from the previous step to fit a linear logistic regression model (only linear terms in the model, no interactions, or any other higher-order terms). See, for example,
stepwiseglm
.Once the
GeneralizedLinearModel
object is satisfactory, set the final model predictors and model coefficients into thecreditscorecard
object using thesetmodel
input arguments forModelPredictors
andModelCoefficients
.
References
[1] Anderson, R. The Credit Scoring Toolkit. Oxford University Press, 2007.
[2] Refaat, M. Credit Risk Scorecards: Development and Implementation Using SAS. lulu.com, 2011.
Version History
Introduced in R2014b
See Also
creditscorecard
| autobinning
| bininfo
| predictorinfo
| modifypredictor
| plotbins
| modifybins
| bindata
| displaypoints
| formatpoints
| score
| stepwiseglm
| fitglm
| fitmodel
| probdefault
| validatemodel
| GeneralizedLinearModel
MATLAB 命令
您点击的链接对应于以下 MATLAB 命令:
请在 MATLAB 命令行窗口中直接输入以执行命令。Web 浏览器不支持 MATLAB 命令。
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list:
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)