Panel data regression comparison

Question

Nick 2022-3-25

0
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/1680749-panel-data-regression-comparison

回答： Gabo 2024-8-27

I have a very large panel data and would like to apply a number of simple machine learning techniques (Logistic Regression, Decision Trees, Bagged Trees).

During my preparation I came across fitglm and fitLifetimePDModel, the latter of which is meant to capture panel data. I was trying to understand how/if that differs from fitglm because when I try the below, the results are exactly the same. Is that right?

Why is that? For example, under fitglm I'm not telling the program that each customer can have more than one data points.

Thank you

load RetailCreditPanelData.mat
pdModel_1 = fitLifetimePDModel(data,"Logistic", 'AgeVar','YOB', 'IDVar','ID', 'LoanVars','ScoreGroup','ResponseVar','Default');
disp(pdModel_1.Model)
pdModel_2 = fitglm(data,'Default ~ 1 + ScoreGroup + YOB', 'Distribution','binomial', 'link', 'logit');
disp(pdModel_2)

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

请先登录，再回答此问题。

Answer 1

Sai Pavan 2023-10-20

0
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/1680749-panel-data-regression-comparison#answer_1337191

Hi Nick,

I understand that you are trying to learn the difference between “fitglm” and “fitLifetimePDModel” functions and want to know why the functions are producing same results.

The “fitLifetimePDModel” function is specifically designed to handle panel data for lifetime models, where each observation represents a customer with multiple data points over time considering the dependence and correlation among the observations within each individual when fitting the model.
On the other hand, “fitglm” is a more general function for fitting generalized linear models, including logistic regression and Poisson regression, treating each observation as independent, without considering any panel structure.
The reason for the results to be exactly same is that both “fitLifetimePDModel” and “fitglm” use logistic regression with the same link function (logit) and distribution (binomial) when fitting the model. In your “fitglm” function call, you explicitly specified the logistic regression formula, which matches the formula used by “fitLifetimePDModel”. Therefore, the resulting models are identical.

Please refer to the below documentation to learn more about “fitglm” and “fitLifetimePDModel” functions:

Hope it helps.

Regards,

Sai Pavan

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

Answer 2

Gabo 2024-8-27

0
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/1680749-panel-data-regression-comparison#answer_1505934

Hi Nick,

You (and Sai) are correct that the model coefficients you get using fitglm and fitLifetimePDModel with the 'logistic' option are the same. For 'logistic' and 'probit', fitLifetimePDModel calls fitglm under the hood, however the model you get is a "wrapper" if you will that offers additional functionality (see next paragraph). The fitLifetimePDModel function does use the panel data structure to estimate the time interval between consecutive rows, which is very important for the 'Cox' lifetime PD model, but not as important for 'logistic' and 'probit'.

The lifetime PD models have the predict method (to predict PDs), but also the predictLifetime (to predict cumulative, survival, marginal probabilities) and the validation functions: modelDiscrimination, modelDiscriminationPlot, modelCalibration, modelCalibrationPlot. There is a lot of information in the Documentation, but this page may be a good starting point: https://www.mathworks.com/help/risk/overview-of-lifetime-probability-of-default.html.

Now, there is also the fitglme function, for generalized mixed-effects models. Take a look a this example in the Documentation and search for 'fitglme': https://www.mathworks.com/help/risk/stress-testing-retail-credit-default-probabilities-using-panel-data-1.html. There is a discussion on training a mixed effects model with the same data, and it has the syntax to do it. That model would take into account panel data information as well.

For any model you train without fitLifetimePDModel, for example, decision trees, bagged trees, mixed effects, if you're interested in lifetime prediction or discrimination/calibration capabilities of lifetime PD models, you can consider training your model and then wrap it as a "custom" lifetime PD model using customLifetimePDModel, see for example: https://www.mathworks.com/help/risk/create-custom-pd-model-for-decision-tree-using-function-handle.html.

Hope this helps,

Gabo

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

Panel data regression comparison

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

回答（2 个）

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

另请参阅

类别

标签

产品

版本

Community Treasure Hunt

Panel data regression comparison

0 个评论 显示 -2更早的评论隐藏 -2更早的评论

回答（2 个）

0 个评论 显示 -2更早的评论隐藏 -2更早的评论

0 个评论 显示 -2更早的评论隐藏 -2更早的评论

另请参阅

类别

标签

产品

版本

Community Treasure Hunt

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

0 个评论
显示 -2更早的评论隐藏 -2更早的评论