Panel data regression comparison

9 次查看(过去 30 天)
Nick
Nick 2022-3-25
回答: Gabo 2024-8-27
I have a very large panel data and would like to apply a number of simple machine learning techniques (Logistic Regression, Decision Trees, Bagged Trees).
During my preparation I came across fitglm and fitLifetimePDModel, the latter of which is meant to capture panel data. I was trying to understand how/if that differs from fitglm because when I try the below, the results are exactly the same. Is that right?
Why is that? For example, under fitglm I'm not telling the program that each customer can have more than one data points.
Thank you
load RetailCreditPanelData.mat
pdModel_1 = fitLifetimePDModel(data,"Logistic", 'AgeVar','YOB', 'IDVar','ID', 'LoanVars','ScoreGroup','ResponseVar','Default');
disp(pdModel_1.Model)
pdModel_2 = fitglm(data,'Default ~ 1 + ScoreGroup + YOB', 'Distribution','binomial', 'link', 'logit');
disp(pdModel_2)

回答(2 个)

Sai Pavan
Sai Pavan 2023-10-20
Hi Nick,
I understand that you are trying to learn the difference between “fitglm” and “fitLifetimePDModel” functions and want to know why the functions are producing same results.
  • The fitLifetimePDModel function is specifically designed to handle panel data for lifetime models, where each observation represents a customer with multiple data points over time considering the dependence and correlation among the observations within each individual when fitting the model.
  • On the other hand, fitglm is a more general function for fitting generalized linear models, including logistic regression and Poisson regression, treating each observation as independent, without considering any panel structure.
  • The reason for the results to be exactly same is that both fitLifetimePDModel and fitglm use logistic regression with the same link function (logit) and distribution (binomial) when fitting the model. In your fitglm” function call, you explicitly specified the logistic regression formula, which matches the formula used by fitLifetimePDModel. Therefore, the resulting models are identical.
Please refer to the below documentation to learn more about fitglm” and “fitLifetimePDModel” functions:
Hope it helps.
Regards,
Sai Pavan

Gabo
Gabo 2024-8-27
Hi Nick,
You (and Sai) are correct that the model coefficients you get using fitglm and fitLifetimePDModel with the 'logistic' option are the same. For 'logistic' and 'probit', fitLifetimePDModel calls fitglm under the hood, however the model you get is a "wrapper" if you will that offers additional functionality (see next paragraph). The fitLifetimePDModel function does use the panel data structure to estimate the time interval between consecutive rows, which is very important for the 'Cox' lifetime PD model, but not as important for 'logistic' and 'probit'.
The lifetime PD models have the predict method (to predict PDs), but also the predictLifetime (to predict cumulative, survival, marginal probabilities) and the validation functions: modelDiscrimination, modelDiscriminationPlot, modelCalibration, modelCalibrationPlot. There is a lot of information in the Documentation, but this page may be a good starting point: https://www.mathworks.com/help/risk/overview-of-lifetime-probability-of-default.html.
Now, there is also the fitglme function, for generalized mixed-effects models. Take a look a this example in the Documentation and search for 'fitglme': https://www.mathworks.com/help/risk/stress-testing-retail-credit-default-probabilities-using-panel-data-1.html. There is a discussion on training a mixed effects model with the same data, and it has the syntax to do it. That model would take into account panel data information as well.
For any model you train without fitLifetimePDModel, for example, decision trees, bagged trees, mixed effects, if you're interested in lifetime prediction or discrimination/calibration capabilities of lifetime PD models, you can consider training your model and then wrap it as a "custom" lifetime PD model using customLifetimePDModel, see for example: https://www.mathworks.com/help/risk/create-custom-pd-model-for-decision-tree-using-function-handle.html.
Hope this helps,
Gabo

类别

Help CenterFile Exchange 中查找有关 Hypothesis Tests 的更多信息

产品


版本

R2020b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by