主要内容

fitrepanel

Fit random effects panel data regression model

Since R2026a

Description

EstMdl = fitrepanel(X,Y) returns the random effects panel data regression model, EstMdl, from fitting the model to the input panel data in wide format. X is a T-by-n-by-p array of predictor data and Y is a T-by-n matrix of response data, where T is the greatest number of sampling time points among subjects, n is the number of sampled subjects, and p is the number of predictor variables. A data set in wide format must be organized as follows:

  • Rows correspond to time points in the sample. In other words, row t contains all p measurements for all n subjects at time t.

  • Columns correspond to sampled subjects. In other words, column c contains all p measurements over all T time points of subject c.

  • For X, pages correspond to predictor variables. In other words, page k contains measurements of predictor k for all n subjects and T time points in the sample. None of the predictors can represent an intercept. Among subjects, sampled time points correspond (fitrepanel assumes all subjects are measured simultaneously).

EstMdl is a panel data regression model PanelModel.

example

EstMdl = fitrepanel(X,Y,groups) fits a random effects panel data regression model to the panel data in long format. X is an m-by-p matrix of predictor data and Y is an m-by-1 vector of response data, where m is the number of observations (for example, m = Tn for a balanced panel data set). Each row is an observation (all measurements) associated with a particular subject at a particular time, and each column is a variable. The groups input specifies to which subject the observation belongs. For a subject, larger row indices indicate measurements taken later in the sample.

example

EstMdl = fitrepanel(Tbl,PredictorVariables=predictorVariables,GroupVariable=groupVariable) fits a random effects panel data regression model to the predictor, response, and subject-assignment data in the table or timetable Tbl. Panel data in a table is in long format. The Tbl input argument has m rows; each row is an observation. The predictorVariables input specifies which table variables are predictor variables. groupVariable specifies to which subject the measurements in the rows of the data belong. The last table variable is the response variable.

example

EstMdl = fitrepanel(___,Name=Value) uses additional options specified by name-value arguments and any input-argument combination in the previous syntaxes. For example, fitrepanel(Tbl,PredictorVariables=predictors,GroupVariable="Country",ResponseVariable="LogGDP",FitEffects=false,Method="ssm") specifies that the table variable LogGDP contains the response data, the table variable Country contains the subject identifiers, and the arbitrary string vector predictors contains the predictor variable names in the table. This syntax skips fitting the unobserved effects and estimates the parameters by using maximum likelihood in the state-space model framework.

example

Examples

collapse all

Fit a random effects panel data regression model to data using default options. The data is in wide format.

Load the simulated, balanced panel data set Data_SimulatedBalancedPanel.mat, which is available when you open this example. The data set contains 12 microeconomic measurements of 1000 randomly selected people taken yearly from 2006 through 2020. The response variable in the model is a series of log wages, while all other variables in the data are predictors.

load Data_SimulatedBalancedPanel

For details on the data set, enter Description at the command line.

The variable Data is a 3-D numeric array containing the predictor and response variables. Each row is a time point in the sampling period, each column is a subject in the sample, and each page is a variable. The final variable in Data is the response variable (log wages series), while all other variables are predictors.

Create separate variables for the predictor and response data.

X = Data(:,:,1:(end-1));
Y = Data(:,:,end);

X is a 15-by-1000-by-11 numeric array of predictor data and Y is a 15-by-1000 numeric matrix. For example, X(10,501,3) is the education experience of subject 501 in 2015.

Create a binary numeric variable for whether the subject is female (coded as 1), by using predictor 2, and a binary numeric variable for whether the subject is married (coded as 1), by using predictor 9.

X(:,:,2) = double(X(:,:,2) == 1);
X(:,:,9) = double(X(:,:,9) == 1);

Assume that the subject effect (heterogeneity) is not associated with the predictor variables. Fit a random effects panel data regression model to the data. Use default options.

EstMdl = fitrepanel(X,Y);
Panel data information: 
Number of cross-sectional units (N):  1000
Number of periods (T):  15
Number of observations:  15000
Method of estimation:  random effects (GLS)

                     | Estimator    SE      tStat   pValue 
-----------------------------------------------------------
 x1                  |   0.0485   0.0003  154.4327   0     
 x2                  |  -0.3381   0.0324  -10.4265  0.0000 
 x3                  |   0.1003   0.0036   28.0246  0.0000 
 x4                  |  -0.1370   0.0389   -3.5231  0.0004 
 x5                  |  -0.0620   0.0047  -13.1530  0.0000 
 x6                  |   0.0087   0.0053    1.6637  0.0962 
 x7                  |  -0.0390   0.0112   -3.4902  0.0005 
 x8                  |  -0.0191   0.0071   -2.6853  0.0072 
 x9                  |  -0.0516   0.0107   -4.8406  0.0000 
 x10                 |   0.0741   0.0051   14.6575  0.0000 
 x11                 |   0.0016   0.0003    5.7319  0.0000 
 DisturbanceVariance |   0.0302                            
 EffectVariance      |   0.0947                            

fitrepanel displays an estimation summary to the command line. Row xj contains, for predictor j, the coefficient estimate, standard error, and t statistic for a two-tailed t test that the coefficient is 0 with its p-value. All predictor variables are significant except for x6 and x7.

Display the fitted model.

EstMdl
EstMdl = 
  PanelModel with properties:

             Coefficients: [11×1 double]
    CoefficientCovariance: [11×11 double]
      DisturbanceVariance: 0.0302
           EffectVariance: 0.0947
                  Effects: [4.5484 4.3304 4.6605 4.9304 4.9176 4.5704 3.8183 4.7566 4.4485 4.1725 4.6417 4.5086 4.7519 4.4794 4.2617 4.7618 4.5002 4.3778 4.1206 3.8693 3.9290 4.5513 4.5366 4.2080 4.6091 4.2711 4.6690 3.7448 3.9863 … ] (1×1000 double)
            LogLikelihood: 4.9624e+03
                  Summary: [13×4 table]
                     Type: "RandomEffects"

EstMdl is a PanelModel object. You can access its properties using dot notation.

Plot the empirical distribution of the heterogeneity.

effects = EstMdl.Effects;
histogram(effects,Normalization="probability")

Figure contains an axes object. The axes object contains an object of type histogram.

Fit a random effects panel data regression model to data using default options. The data is in long format.

Load the simulated, balanced panel data set Data_SimulatedBalancedPanel.mat, which is available when you open this example. The data set contains 12 microeconomic measurements of 1000 randomly selected people taken yearly from 2006 through 2020. The response variable in the model is a series of log wages, while all other variables in the data are predictors.

Load and Extract Data

load Data_SimulatedBalancedPanel

For details on the data set, enter Description at the command line.

The variable Data is a 3-D numeric array containing the predictor and response variables. Each row is a time point in the sampling period, each column in a subject is the sample, and each page is a variable. The final variable in Data is the response variable (log wages series), while all other variables are predictors. This data format is wide.

Create separate variables for the predictor and response data.

X = Data(:,:,1:(end-1));
Y = Data(:,:,end);
[T,n,p] = size(X)
T = 
15
n = 
1000
p = 
11

X is a 15-by-1000-by-11 numeric array of predictor data and Y is a 15-by-1000 numeric matrix. For example, X(10,501,3) is the education experience of subject 501 in 2015.

Convert Data to Long Format

Data in long format must have the following characteristics:

  • The response data is a Tn-by-1 vector, where T is the number of periods in time time base and n is the number of subjects. In this example, the long-format response data is a 15000-by-1 vector.

  • The predictor data is a Tn-by-p matrix, where p is the number of predictors. In this example, the long-format predictor data is a 15000-by-11 matrix.

  • The software must be able to identify to which subject the observation belongs by a Tn-by-1 vector of subject IDs.

  • For each subject, the software assumes that observations in higher rows were sampled later.

Convert the response data to long format by stacking the columns of Y using linear indexing with a single colon.

YLong = Y(:);
size(YLong)
ans = 1×2

       15000           1

For selected subjects, verify that the responses are arranged by blocks of subjects, increasing by sampling time within each block. To choose a subject to check, use the control.

j = 3; % Subject index
YSubj = Y(1:T,j);
YLongSubj = YLong((T*(j-1)+1):(T*j));
sum(YSubj - YLongSubj)
ans = 
0

Convert the predictor data to long format by stacking the columns of X and setting its pages to columns using reshape.

XLong = reshape(X,T*n,11);
size(XLong)
ans = 1×2

       15000          11

XLong is arranged such that all subject-specified measurements are blocked together and stacked, and within-subject blocks of observations are arranged in increasing order by sampling time.

For selected subjects, verify that the predictor data are arranged by blocks of subjects, increasing by sampling time within each block. To choose a subject to check, use the control.

j = 6; 
XSubj = squeeze(X(1:T,j,:));
XLongSubj = XLong((T*(j-1)+1):(T*j),:);
sum(sum(XSubj - XLongSubj))
ans = 
0

Observations are arranged by blocks of subjects. Create a numeric vector, which identifies each subject, by repeating each integer in the interval [1,n] T times, and then stacking the results.

Groups = repmat(1:n,T,1);
Groups = Groups(:);

Preprocess Data

Create a binary numeric variable for whether the subject is female (coded as 1), by using predictor 2, and a binary numeric variable for whether the subject is married (coded as 1), by using predictor 9.

XLong(:,2) = double(XLong(:,2) == 1);
XLong(:,9) = double(XLong(:,9) == 1);

Fit the Model to Data

Fit a random effects panel data regression model of the log wage series (LogWage) to all other variables in the timetable except the subject ID (Group). Specify the predictor and grouping variables; fitrepanel assumes the final variable is the response variable.

Assume that the heterogeneity is not associated with the predictor variables. Fit a random effects panel data regression model to the long-format data. Specify the grouping variable. Use default options.

EstMdl = fitrepanel(XLong,YLong,Groups);
Panel data information: 
Number of cross-sectional units (N):  1000
Number of periods (T):  15
Number of observations:  15000
Method of estimation:  random effects (GLS)

                     | Estimator    SE      tStat   pValue 
-----------------------------------------------------------
 x1                  |   0.0485   0.0003  154.4327   0     
 x2                  |  -0.3381   0.0324  -10.4265  0.0000 
 x3                  |   0.1003   0.0036   28.0246  0.0000 
 x4                  |  -0.1370   0.0389   -3.5231  0.0004 
 x5                  |  -0.0620   0.0047  -13.1530  0.0000 
 x6                  |   0.0087   0.0053    1.6637  0.0962 
 x7                  |  -0.0390   0.0112   -3.4902  0.0005 
 x8                  |  -0.0191   0.0071   -2.6853  0.0072 
 x9                  |  -0.0516   0.0107   -4.8406  0.0000 
 x10                 |   0.0741   0.0051   14.6575  0.0000 
 x11                 |   0.0016   0.0003    5.7319  0.0000 
 DisturbanceVariance |   0.0302                            
 EffectVariance      |   0.0947                            

The results are the same as the results from the model fit to data in wide format.

Fit a random effects panel data regression model to data using default options. The data is in a timetable.

Load the simulated, balanced panel data set Data_SimulatedBalancedPanel.mat, which is available when you open this example. The data set contains 12 microeconomic measurements of 1000 randomly selected people taken yearly from 2006 through 2020. The response variable in the model is a series of log wages, while all other variables in the data are predictors.

load Data_SimulatedBalancedPanel

For details on the data set, enter Description at the command line.

The variable DataTimeTable is a timetable containing the data. LogWage is the response variable, Group is the subject ID (grouping) variable, and all other variables are predictors. Each row is an observation for a subject at a time point in the sampling period (in other words, this data format is wide).

Display the head and size of the timetable of data.

head(DataTimeTable)
    Time    WorkExperience    Gender    Education    Ethnicity    IsBlueCollar    IsManufacturing    IsSouth    IsCity    MaritalStatus    IsUnion    WeeksWorked    Group    LogWage
    ____    ______________    ______    _________    _________    ____________    _______________    _______    ______    _____________    _______    ___________    _____    _______

    2006          29          female       12            0             1                 0              0         0       nevermarried        0           48           1      6.7956 
    2007          30          female       12            0             1                 0              0         0       nevermarried        0           49           1      6.6592 
    2008          31          female       12            0             1                 0              0         0       nevermarried        0           51           1      6.9801 
    2009          32          female       12            0             0                 0              0         0       nevermarried        0           45           1      7.2397 
    2010          33          female       12            0             0                 0              0         0       nevermarried        0           25           1       7.123 
    2011          34          female       12            0             0                 0              0         0       nevermarried        0           42           1      6.9183 
    2012          35          female       12            0             0                 0              0         0       nevermarried        0           48           1      7.1639 
    2013          36          female       12            0             0                 0              0         0       nevermarried        0           49           1      7.0534 
size(DataTimeTable)
ans = 1×2

       15000          13

Create a new timetable TT containing a binary numeric variable for whether the subject is female, by using Gender, and a binary numeric variable for whether the subject is married, by using MaritalStatus. Then, remove the corresponding variables from TT.

TT = DataTimeTable;
TT.IsFemale = double(TT.Gender == "female");
TT = movevars(TT,"IsFemale","Before","Gender");
TT.Gender = [];
TT.IsMarried = double(TT.MaritalStatus == "married");
TT = movevars(TT,"IsMarried","Before","MaritalStatus");
TT.MaritalStatus = [];

Fit a random effects panel data regression model of the log wage series (LogWage) to all other variables in the timetable except the subject ID (Group). Specify the predictor and grouping variable names; fitrepanel assumes the final variable is the response variable.

prednames = TT.Properties.VariableNames(1:end-2);
EstMdl = fitrepanel(TT,PredictorVariables=prednames,GroupVariable="Group");
Panel data information: 
Number of cross-sectional units (N):  1000
Number of periods (T):  15
Number of observations:  15000
Method of estimation:  random effects (GLS)

                     | Estimator    SE      tStat   pValue 
-----------------------------------------------------------
 WorkExperience      |   0.0485   0.0003  154.4327   0     
 IsFemale            |  -0.3381   0.0324  -10.4265  0.0000 
 Education           |   0.1003   0.0036   28.0246  0.0000 
 Ethnicity           |  -0.1370   0.0389   -3.5231  0.0004 
 IsBlueCollar        |  -0.0620   0.0047  -13.1530  0.0000 
 IsManufacturing     |   0.0087   0.0053    1.6637  0.0962 
 IsSouth             |  -0.0390   0.0112   -3.4902  0.0005 
 IsCity              |  -0.0191   0.0071   -2.6853  0.0072 
 IsMarried           |  -0.0516   0.0107   -4.8406  0.0000 
 IsUnion             |   0.0741   0.0051   14.6575  0.0000 
 WeeksWorked         |   0.0016   0.0003    5.7319  0.0000 
 DisturbanceVariance |   0.0302                            
 EffectVariance      |   0.0947                            

The results are the same as the results from the model fit to data in wide format.

Estimate a random effects panel data regression model of log wages as a function of a set of predictors by viewing the model as a linear state-space model.

By default, fitrepanel uses GLS to estimate the model. Alternatively, you can specify that fitrepanel view the model as a linear state-space, and apply maximum likelihood to estimate the parameters.

Load the simulated, balanced panel data set Data_SimulatedBalancedPanel.mat, which is available when you open this example. The data set contains 12 microeconomic measurements of 1000 randomly selected people taken yearly from 2006 through 2020. The response variable in the model is a series of log wages, while all other variables in the data are predictors.

load Data_SimulatedBalancedPanel

For details on the data set, enter Description at the command line.

Create a new timetable TT containing a binary numeric variable for whether the subject is female, by using Gender, and a binary numeric variable for whether the subject is married, by using MaritalStatus. Then, remove the corresponding variables from TT.

TT = DataTimeTable;
TT.IsFemale = double(TT.Gender == "female");
TT = movevars(TT,"IsFemale","Before","Gender");
TT.Gender = [];
TT.IsMarried = double(TT.MaritalStatus == "married");
TT = movevars(TT,"IsMarried","Before","MaritalStatus");
TT.MaritalStatus = [];

Fit a random effects panel data regression model of the log wage series (LogWage) to all other variables in the processed timetable TT except the subject ID. Specify the state-space model estimation method.

varnames = TT.Properties.VariableNames;
prednames = varnames(~ismember(varnames,["Group" "LogWage"]));
EstMdl = fitrepanel(TT,PredictorVariables=prednames,GroupVariable="Group",Method="ssm");
Panel data information: 
Number of cross-sectional units (N):  1000
Number of periods (T):  15
Number of observations:  15000
Method of estimation:  random effects (SSM)

                     | Estimator    SE      tStat   pValue 
-----------------------------------------------------------
 WorkExperience      |   0.0485   0.0003  154.4004   0     
 IsFemale            |  -0.3381   0.0325  -10.3976  0.0000 
 Education           |   0.1003   0.0036   27.9469  0.0000 
 Ethnicity           |  -0.1370   0.0390   -3.5132  0.0004 
 IsBlueCollar        |  -0.0620   0.0047  -13.1543  0.0000 
 IsManufacturing     |   0.0088   0.0053    1.6647  0.0960 
 IsSouth             |  -0.0389   0.0112   -3.4838  0.0005 
 IsCity              |  -0.0191   0.0071   -2.6810  0.0073 
 IsMarried           |  -0.0516   0.0107   -4.8464  0.0000 
 IsUnion             |   0.0741   0.0051   14.6588  0.0000 
 WeeksWorked         |   0.0016   0.0003    5.7332  0.0000 
 DisturbanceVariance |   0.0302   0.0004   84.7569   0     
 EffectVariance      |   0.0953   0.0040   23.7771  0.0000 

The results are nearly the same as the results from the model fit using GLS. This similarity occurs because, in this problem, maximum likelihood and GLS are asymptotically equal.

Fit a random effects panel data regression model to data; fix the random effects variance to a known value.

Load the simulated, balanced panel data set Data_SimulatedBalancedPanel.mat, which is available when you open this example. The data set contains 12 microeconomic measurements of 1000 randomly selected people taken yearly from 2006 through 2020. The response variable in the model is a series of log wages, while all other variables in the data are predictors.

load Data_SimulatedBalancedPanel

For details on the data set, enter Description at the command line.

Create a new timetable TT containing a binary numeric variable for whether the subject is female, by using Gender, and a binary numeric variable for whether the subject is married, by using MaritalStatus. Then, remove the corresponding variables from TT.

TT = DataTimeTable;
TT.IsFemale = double(TT.Gender == "female");
TT = movevars(TT,"IsFemale","Before","Gender");
TT.Gender = [];
TT.IsMarried = double(TT.MaritalStatus == "married");
TT = movevars(TT,"IsMarried","Before","MaritalStatus");
TT.MaritalStatus = [];

Fit a random effects panel data regression model of the log wage series (LogWage) to all other variables in the timetable except the subject ID (Group). Specify the predictor and grouping variable names. Fix the random effects variance to 0.05, 0.1, and then 0.2. Suppress the estimation display.

prednames = TT.Properties.VariableNames(1:end-2);
sigma2alpha = [0.05 0.1 0.5 1];
m = numel(sigma2alpha);
Coefficients = cell(3,1);
DisturbanceVariance = zeros(3,1);

for j = 1:m
    EstMdl = fitrepanel(TT,PredictorVariables=prednames,GroupVariable="Group", ...
        EffectVariance=sigma2alpha(j),Display=false);
    Coefficients{j} = EstMdl.Coefficients;
    DisturbanceVariance(j) = EstMdl.DisturbanceVariance;
end

cell2mat(Coefficients')
ans = 11×4

    0.0486    0.0485    0.0484    0.0483
   -0.3376   -0.3381   -0.3387   -0.3388
    0.1003    0.1004    0.1004    0.1004
   -0.1378   -0.1370   -0.1359   -0.1357
   -0.0621   -0.0620   -0.0618   -0.0618
    0.0082    0.0088    0.0094    0.0096
   -0.0451   -0.0385   -0.0294   -0.0278
   -0.0231   -0.0189   -0.0148   -0.0143
   -0.0413   -0.0523   -0.0653   -0.0673
    0.0744    0.0741    0.0740    0.0740
    0.0016    0.0016    0.0016    0.0016

DisturbanceVariance
DisturbanceVariance = 4×1

    0.0302
    0.0302
    0.0302
    0.0302

The estimates are nearly the same among the effects variance settings, which shows that the estimators are robust to moderate to extreme levels of the effects variance.

Fit a random effects panel data regression model to data and obtain robust estimates.

Load the simulated, balanced panel data set Data_SimulatedBalancedPanel.mat, which is available when you open this example. The data set contains 12 microeconomic measurements of 1000 randomly selected people taken yearly from 2006 through 2020. The response variable in the model is a series of log wages, while all other variables in the data are predictors.

load Data_SimulatedBalancedPanel

For details on the data set, enter Description at the command line.

Create separate variables for the predictor and response data.

X = Data(:,:,1:(end-1));
[T,n,p] = size(X);
Y = Data(:,:,end);

Create a binary numeric variable for whether the subject is female (coded as 1), by using predictor 2, and a binary numeric variable for whether the subject is married (coded as 1), by using predictor 9.

X(:,:,2) = double(X(:,:,2) == 1);
X(:,:,9) = double(X(:,:,9) == 1);

Simulate heteroscedasticity in the system by using unmeasured, subject-specific predictor variables zj such that, for each subject j=1,...,n, zjPois(λj) and λjUniform(1,...,50). For each subject, simulate T values.

rng(1,"twister")
Z = zeros(T,n);
for j = 1:n
    lambda = randi(50);
    Z(:,j) = poissrnd(lambda,T,1);
end

Add the simulated predictor data to the response data with coefficient βz=2.

YSim = Y + 2*Z;

Assume that the heterogeneity is not associated with the predictor variables. Fit a random effects panel data regression model to the predictor data without z and the simulated response data. Use default options.

EstMdl = fitrepanel(X,YSim);
Panel data information: 
Number of cross-sectional units (N):  1000
Number of periods (T):  15
Number of observations:  15000
Method of estimation:  random effects (GLS)

                     | Estimator    SE     tStat   pValue 
----------------------------------------------------------
 x1                  |    0.0196  0.0190   1.0279  0.3040 
 x2                  |    1.4279  3.0659   0.4657  0.6414 
 x3                  |   -0.1095  0.3385  -0.3236  0.7462 
 x4                  |   -2.0201  3.6775  -0.5493  0.5828 
 x5                  |    0.3700  0.2763   1.3392  0.1805 
 x6                  |   -0.2306  0.3087  -0.7472  0.4550 
 x7                  |    0.9527  0.6997   1.3616  0.1733 
 x8                  |    0.1847  0.4256   0.4341  0.6642 
 x9                  |    0.3463  0.6632   0.5223  0.6015 
 x10                 |   -0.0337  0.2965  -0.1137  0.9094 
 x11                 |    0.0070  0.0161   0.4345  0.6640 
 DisturbanceVariance |  101.9401                          
 EffectVariance      |  859.3624                          

Compute model residuals εˆti=Yti-Xtiβˆ-αˆi, and plot them against the fitted responses. Color the residuals according to subject ID.

betahat = reshape(EstMdl.Coefficients,1,1,p);
alphahat = EstMdl.Effects;
Yhat = sum(X.*betahat,3) + alphahat;
Residuals = YSim - Yhat;
figure
plot(Yhat,Residuals,'.')
hold on
yline(0,"--")
hold off
title("Residuals by Subject")
ylabel("Residual")
xlabel("Fitted Value")

Figure contains an axes object. The axes object with title Residuals by Subject, xlabel Fitted Value, ylabel Residual contains 1001 objects of type line, constantline.

The residuals scatter more widely as the fitted values increase. This behavior is indicative of heteroscedasticity. Also, residuals appear clustered by groups.

Refit the model; compute robust covariance estimates.

EstMdlRobust = fitrepanel(X,YSim,RobustCovariance=true);
Panel data information: 
Number of cross-sectional units (N):  1000
Number of periods (T):  15
Number of observations:  15000
Method of estimation:  random effects (GLS)

                     | Estimator    SE     tStat   pValue 
----------------------------------------------------------
 x1                  |    0.0196  0.0191   1.0222  0.3067 
 x2                  |    1.4279  2.8975   0.4928  0.6222 
 x3                  |   -0.1095  0.3417  -0.3205  0.7486 
 x4                  |   -2.0201  3.5807  -0.5642  0.5726 
 x5                  |    0.3700  0.2755   1.3431  0.1792 
 x6                  |   -0.2306  0.3215  -0.7174  0.4731 
 x7                  |    0.9527  0.6701   1.4218  0.1551 
 x8                  |    0.1847  0.4344   0.4253  0.6706 
 x9                  |    0.3463  0.6501   0.5327  0.5942 
 x10                 |   -0.0337  0.2976  -0.1133  0.9098 
 x11                 |    0.0070  0.0159   0.4414  0.6589 
 DisturbanceVariance |  101.9401                          
 EffectVariance      |  859.3624                          

The coefficient estimates between the regular and robust runs are the same; the difference between the runs is in the inferences.

Plot a heatmap of the difference between the estimated coefficient covariance matrix.

seriesSim = series(1:p);
heatmap(seriesSim,seriesSim,(EstMdlRobust.CoefficientCovariance-EstMdl.CoefficientCovariance)./EstMdl.CoefficientCovariance)

Figure contains an object of type heatmap.

The estimated covariance of the coefficients of Education and WeeksWorked shows the greatest relative difference between the robust and non-robust analyses.

Input Arguments

collapse all

Predictor data X, specified as an m-by-p numeric matrix or a T-by-n-by-p numeric 3-D array, where m is to total number of observations, p is the number of predictor variables, n is the number of sampled subjects (groups), and T is the largest number of sampling time points among subjects.

When X is a matrix, the data sets are in long format and the following conditions apply:

  • You must provide the subject identifiers input groups.

  • Each row is an observation taken at a particular time from a particular subject. Put differently, row j contains measurements for all predictors at time t for the subject g, where groups(j) is g.

  • Suppose Xg = X(groups == g,:) identify the observations of subject g. Thus, row t1 < row t1 implies Xg(t1,:) was observed earlier than Xg(t2,:).

  • For each sampling time t, fitrepanel assumes all subjects were measured simultaneously.

  • Column k contains measurements of predictor variable k.

  • You must provide the response data Y as an m-by-1 vector.

When X is a 3-D array, the data sets are in wide format and the following conditions apply:

  • Row t contains all measurements taken at time t. Row t1 < row t2 implies the sampling time t1 < sampling time t2.

  • Column c contains all measurements of subject c.

  • Page k contains measurements of predictor variable k.

  • You must provide the response data Y as a T-by-n matrix.

Do not include a predictor variable entirely composed of ones in X to represent the model intercept.

NaN values in X indicate missing measurements. For unbalanced data in wide format, you must insert rows of NaN values for unmeasured time points among all pages.

For more details, see Panel Data.

Data Types: double

Response data Y, specified as an m-by-1 numeric vector or a T-by-n numeric matrix.

When Y is a vector the data sets are in long format and the following conditions apply:

  • You must provide the subject identifiers input groups.

  • Row j contains the response at time t for the subject g, where input groups(j) is g.

  • Suppose Yg = Y(groups == g) identify the responses of subject g. Thus, fitrepanel assumes that, if Yg(j) was observed at time t, Yg(j + 1) was observed at time t + f, where f is the sampling frequency.

  • For each sampling time t, fitrepanel assumes all subjects were measured simultaneously.

  • You must provide the predictor data X as an m-by-p matrix.

When Y is a matrix, the data sets are in wide format and the following statements apply:

  • Row t contains all measurements taken at time t. Row t1 < row t2 implies the sampling time t1 < sampling time t2.

  • Column c contains all measurements of subject c.

  • You must provide the predictor data X as a T-by-n-by-p 3-D array.

NaN values in Y indicate missing responses. For unbalanced data in wide format, you must insert rows of NaN values for unmeasured time points.

For more details, see Panel Data.

Data Types: double

Subject (group) identifiers for unobserved effects, specified as an m-by-1 vector. The unique values in groups identify sampled subjects.

When you specify data in long format, you must specify groups.

NaN values in groups indicate missing group identifiers for the corresponding observations. fitrepanel removes entire observations from the data when it cannot assign them to a group. Such data removal can cause unbalanced panel data.

For more details, see Panel Data.

Data Types: double | categorical | cell | char | string

Panel data in long format, to which fitrepanel fits the model, specified as a table or timetable with numvars variables and m rows.

When you specify Tbl, the following conditions apply:

  • Each row is an observation taken at a particular time from a particular subject. Put differently, row j contains measurements for all variables at time t for the subject g.

  • Suppose Tblg = Tbl(Tbl.groupVariable == g,:) identifies the observations of subject g. Thus, fitrepanel assumes that, if Tblg(j,:) was observed at time t, Tblg(j + 1,:) was observed at time t + f.

  • For each sampling time t, fitrepanel assumes all subjects were measured simultaneously.

  • Specify the predictor variables in the model X by setting PredictorVariables=predictorVariables. Each selected predictor variable must be a numeric vector.

  • Specify the subject (group) identifier variable by setting GroupVariables=groupVariable. The selected subject identifier variable can be a numeric, categorical, or text vector.

  • By default, the last variable is the response variable Y, but you can specify a different variable by setting ResponseVariable=responseVariable. The selected response variable must be a numeric vector.

Do not include a predictor variable entirely composed of ones in Tbl to represent the model intercept.

NaN values in Tbl indicate missing measurements. NaN values in groups indicate missing group identifiers for the corresponding observations. fitrepanel removes entire observations from the data when it cannot assign them to a group. Such data removal can cause unbalanced panel data.

For more details, see Panel Data.

Predictor variables X to select from Tbl, which contain predictor data, specified as one of the following data types:

  • String vector or cell vector of character vectors containing p variable names in Tbl.Properties.VariableNames

  • A length p vector of unique indices (positive integers) of variables to select from Tbl.Properties.VariableNames

  • A length numvars = width(Tbl) logical vector, where PredictorVariables(j) = true selects variable j from Tbl.Properties.VariableNames, and sum(PredictorVariables) is p

Example: PredictorVariables=["M1SL" "TB3MS" "UNRATE"]

Example: PredictorVariables=[true false true false] or PredictorVariable=[1 3] selects the first and third table variables to supply the predictor data.

Data Types: double | logical | char | cell | string

Subject (group) variable to select from Tbl, which contains subject identifier data for the unobserved effects, specified as one of the following data types:

  • String scalar or character vector containing the variable name to select from Tbl.Properties.VariableNames

  • Variable index (positive integer) to select from Tbl.Properties.VariableNames

  • A logical vector, where groupVariable(j) = true selects variable j from Tbl.Properties.VariableNames

Example: GroupVariable="Country"

Example: GroupVariable=[false false true false] or GroupVariable=3 selects the third table variable as the subject variable.

Data Types: double | logical | char | cell | string

Name-Value Arguments

collapse all

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Example: fitrepanel(Tbl,PredictorVariables=predictors,GroupVariable="Subjects",ResponseVariable="Response",FitEffects=false,Method="ssm") specifies that the table variable "Response" contains the response data, the table variable "Subjects" contains the subject identifiers, and the arbitrary string vector predictors contains the predictor variable names in the table. This syntax skips fitting the unobserved effects and estimates the parameters using maximum likelihood in the state-space model framework.

Disturbance variance σε2, specified as a nonnegative numeric scalar.

When you specify DisturbanceVariance=sigma2eps, fitrepanel fixes the value of the disturbance variance to sigma2eps during estimation. When DisturbanceVariance=NaN, the default, fitrepanel estimates the disturbance variance with all other estimable parameters.

Example: DisturbanceVariance=5

Data Types: double

Variance of unobserved, subject-specific effects σɑ2, specified as a nonnegative numeric scalar.

When you specify EffectVariance=sigma2alpha, fitrepanel fixes the value of the effects variance to sigma2alpha during estimation. When EffectVariance=NaN, the default, fitrepanel estimates the effects variance with all other estimable parameters.

Example: EffectVariance=1

Data Types: double

Parameter estimation method, specified as a value in this table:

ValueDescription
"gls"Generalized least squares
"ssm"Maximum likelihood estimation by linear state-space model formulation of model

For more details, see Estimation Method Descriptions.

Example: Method="ssm"

Data Types: char | string

Robust covariance estimation flag, specified as a value in this table:

ValueDescription
falsefitrepanel does not compute cluster-robust covariance estimates.
truefitrepanel computes cluster-robust covariance estimates.

Coefficient estimates between the non-robust-covariance and robust-covariance estimation methods are equal. Coefficient covariance estimates, and therefore inferences, between the non-robust and robust-covariance estimation methods are not necessarily equal.

Tip

Although you should set RobustCovariance=true when residuals show evidence of heteroscedasticity or serial correlation, [1] suggests this setting whenever it is feasible.

Example: RobustCovariance=true

Data Types: logical

Unobserved effects ɑ estimation flag, specified as a value in this table:

ValueDescription
falsefitrepanel does not estimate ɑ.
truefitrepanel estimates ɑ and reports its estimates.

To fit the model using less computational resources and obtain only coefficient and covariance estimates, and inferences, set FitEffects=false.

For details on how fitrepanel estimates the random effects, see Latent Effects Estimation.

Example: FitEffects=false

Data Types: logical

Predictor variable names for displays when you specify X, specified as a string vector or cell vector of character vectors. VarNames must contain NumPredictors elements. VarNames(j) is the name of the variable j in the predictor data X.

The default is ["x1" "x2" ... "xp"}.

Example: VarNames=["UnemploymentRate"; "CPI"]

Data Types: string | cell | char

Estimation display flag, specified as a value in this table.

ValueDescription
falsefitrepanel does not display estimation results to the command line.
truefitrepanel displays estimation results to the command line.

Example: Display=false

Data Types: logical

Response variable y to select from Tbl containing the response data, specified as one of the following data types:

  • String scalar or character vector containing a variable name in Tbl.Properties.VariableNames

  • Variable index (integer) to select from Tbl.Properties.VariableNames

  • A length numvars logical vector, where ResponseVariable(j) = true selects variable j from Tbl.Properties.VariableNames, and sum(ResponseVariable) is 1

Example: ResponseVariable="Wages"

Example: ResponseVariable=[false false true false] or ResponseVariable=3 selects the third table variable as the response variable.

Data Types: double | logical | char | cell | string

Output Arguments

collapse all

Estimated panel model, returned as a PanelModel object. EstMdl contains properties that store the estimation results from fitting the random effects panel data regression model to the data. You can access its properties by using dot notation.

More About

collapse all

Algorithms

collapse all

References

[1] Wooldridge, Jeffrey M. Econometric Analysis of Cross Section and Panel Data, Second edition. Cambridge, MA: The MIT Press, 2010.

[2] Greene, William H. Econometric Analysis, Fifth edition. New York: Pearson, 2018.

Version History

Introduced in R2026a

See Also

Objects

Functions