bootstrap clustering at region level

1 次查看(过去 30 天)
Dear community,
I have a city-year level panel data, with 200 cities and 13 years. So each variable has size 2600x1. I have 40 variables. So the whole dataset has size 2600x40. I have about 70 parameters. I am trying to use bootstrap and get standard errors for estimates of a nonlinear problem. However, observations within a region may be correlated. I have never done boostrap before, but my plan is to:
  1. Draw a sample j of 2600x40 data.
  2. Compute all 70 estimates. This step invovle linear regression and fminsearch for a non-linear problem. Call these estimates . Store it in the jth column of matrix A.
  3. Repeat steps above 500 times. So matrix A has size 70x500.
  4. Compute the standard deviation of each row. This is the standard error for each parameter.
Does this procedure seem right?
I see that Matlab has two boostrap commands: bootstrp and bootci. Can I ask:
  1. Which command is the correct one to use?
  2. How to redraw samples at regional, instead of city level?
  3. Can I input the original data as 40 columns, or do I have to input it as one 2600x40 matrix? The document says data could be entered as d or d1, ..., dN, but want to check if I understand it correctly...
Thank you very much for your help!!

回答(1 个)

Maneet Kaur Bagga
Maneet Kaur Bagga 2024-10-10
Hi,
Please refer to the following as answer to your questions:
Bootstrap commands "bootstrp" and "bootci":
  1. bootstrp - This function is used when you need to draw samples and parameter estimation across those samples. This function is suitable for your workflow as you are bootstrapping 500 sample and stroing estimate for wach sample.
  2. bootci - This function is used for computing confidence intervals from bootstrapping. This is useful when you want to derive confidence intervals in addition to standard errors.
Sampling at Regional Level:
To sample at the regional (or city) level, instead of resampling individual observations, you need to sample entire blocks of data based on your regions. You can use "bootstrp" function to specify a custom sampling function that draws regions and then collect city-year observations for those regions. Please refer to the code snippet below as an example:
% Define the region (city) labels for the 2600 observations
city_ids = repmat(1:200, 13, 1); % Adjust based on your data structure
% Sample at the region (city) level using bootstrp
region_bootstrap = bootstrp(500, @(indices) your_custom_function(indices, city_ids, data), 1:200);
"bootstrp" function documentation - https://in.mathworks.com/help/stats/bootstrp.html
Data Input in "bootstrp" :
You can input the data as individual columns (d1, d2, ..., dN). You can also input data as a single matrix of "2600x40" by passing it to the estimation function.
% Assuming data is a 2600x40 matrix
bootstat = bootstrp(500, @your_estimation_function, data);
Hope this helps!

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by