# 使用广义线性模型拟合数据

### 广义线性模型

```mu = @(x) -1.9+.23*x; x = 5:.1:15; yhat = mu(x); dy = -3.5:.1:3.5; sz = size(dy); k = (length(dy)+1)/2; x1 = 7*ones(sz); y1 = mu(x1)+dy; z1 = normpdf(y1,mu(x1),1); x2 = 10*ones(sz); y2 = mu(x2)+dy; z2 = normpdf(y2,mu(x2),1); x3 = 13*ones(sz); y3 = mu(x3)+dy; z3 = normpdf(y3,mu(x3),1); plot3(x,yhat,zeros(size(x)),'b-', ... x1,y1,z1,'r-', x1([k k]),y1([k k]),[0 z1(k)],'r:', ... x2,y2,z2,'r-', x2([k k]),y2([k k]),[0 z2(k)],'r:', ... x3,y3,z3,'r-', x3([k k]),y3([k k]),[0 z3(k)],'r:'); zlim([0 1]); xlabel('X'); ylabel('Y'); zlabel('Probability density'); grid on; view([-45 45]); ```

```mu = @(x) exp(-1.9+.23*x); x = 5:.1:15; yhat = mu(x); x1 = 7*ones(1,5); y1 = 0:4; z1 = poisspdf(y1,mu(x1)); x2 = 10*ones(1,7); y2 = 0:6; z2 = poisspdf(y2,mu(x2)); x3 = 13*ones(1,9); y3 = 0:8; z3 = poisspdf(y3,mu(x3)); plot3(x,yhat,zeros(size(x)),'b-', ... [x1; x1],[y1; y1],[z1; zeros(size(y1))],'r-', x1,y1,z1,'r.', ... [x2; x2],[y2; y2],[z2; zeros(size(y2))],'r-', x2,y2,z2,'r.', ... [x3; x3],[y3; y3],[z3; zeros(size(y3))],'r-', x3,y3,z3,'r.'); zlim([0 1]); xlabel('X'); ylabel('Y'); zlabel('Probability'); grid on; view([-45 45]); ```

### 拟合逻辑回归

```% A set of car weights weight = [2100 2300 2500 2700 2900 3100 3300 3500 3700 3900 4100 4300]'; % The number of cars tested at each weight tested = [48 42 31 34 31 21 23 23 21 16 17 21]'; % The number of cars failing the test at each weight failed = [1 2 0 3 8 8 14 17 19 15 17 21]'; % The proportion of cars failing for each weight proportion = failed ./ tested; plot(weight,proportion,'s') xlabel('Weight'); ylabel('Proportion'); ```

```linearCoef = polyfit(weight,proportion,1); linearFit = polyval(linearCoef,weight); plot(weight,proportion,'s', weight,linearFit,'r-', [2000 4500],[0 0],'k:', [2000 4500],[1 1],'k:') xlabel('Weight'); ylabel('Proportion'); ```

1) 线上存在小于 0 和大于 1 的预测比例值。

2) 因为这些比例值是有界的，因而不符合正态分布。这违反了拟合简单线性回归模型需满足的假设之一。

```[cubicCoef,stats,ctr] = polyfit(weight,proportion,3); cubicFit = polyval(cubicCoef,weight,[],ctr); plot(weight,proportion,'s', weight,cubicFit,'r-', [2000 4500],[0 0],'k:', [2000 4500],[1 1],'k:') xlabel('Weight'); ylabel('Proportion'); ```

```[logitCoef,dev] = glmfit(weight,[failed tested],'binomial','logit'); logitFit = glmval(logitCoef,weight,'logit'); plot(weight,proportion,'bs', weight,logitFit,'r-'); xlabel('Weight'); ylabel('Proportion'); ```

### 模型诊断

`glmfit` 函数提供几个输出，用于检查拟合和测试模型。例如，我们可以比较两个模型的偏差值，以确定平方项是否可以显著改善拟合。

```[logitCoef2,dev2] = glmfit([weight weight.^2],[failed tested],'binomial','logit'); pval = 1 - chi2cdf(dev-dev2,1) ```
```pval = 0.4019 ```

```logitFit2 = glmval(logitCoef2,[weight weight.^2],'logit'); plot(weight,proportion,'bs', weight,logitFit,'r-', weight,logitFit2,'g-'); legend('Data','Linear Terms','Linear and Quadratic Terms','Location','northwest'); ```

```[logitCoef,dev,stats] = glmfit(weight,[failed tested],'binomial','logit'); normplot(stats.residp); ```

### 计算模型预测

```weightPred = 2500:500:4000; [failedPred,dlo,dhi] = glmval(logitCoef,weightPred,'logit',stats,.95,100); errorbar(weightPred,failedPred,dlo,dhi,':'); ```

### 二项模型的联系函数

```eta = -5:.1:5; plot(eta,1 ./ (1 + exp(-eta)),'-', eta,normcdf(eta), '-', ... eta,1 - exp(-exp(eta)),'-', eta,exp(-exp(eta)),'-'); xlabel('Linear function of predictors'); ylabel('Predicted mean response'); legend('logit','probit','complementary log-log','log-log','location','east'); ```

```probitCoef = glmfit(weight,[failed tested],'binomial','probit'); probitFit = glmval(probitCoef,weight,'probit'); plot(weight,proportion,'bs', weight,logitFit,'r-', weight,probitFit,'g-'); legend('Data','Logit model','Probit model','Location','northwest'); ```