Proving CLT using real data

6 次查看(过去 30 天)
I'm trying to understand the CLT with real data, not only with a simulation, but for now, consider the following simulation that I made.
n = 50; % Number of samples
m = 1e3; % Number of times I simulate the sample
mu = 1;
sigma2 = 2;
X = normrnd(mu, sqrt(sigma2), n, M); % Supposing that my data follows a N~(1,2) distribution
X_bar = mean(X);
% Checking that the mean follows a normal distribution, according to the CLT
histogram(X_bar, 'Normalization', 'pdf')
hold on
x = linspace(0, 2, 5e2);
plot(x, normpdf(x, mu, sqrt(sigma2 / n)))
legend('Distr. of X\_bar', 'Asymp. Normal')
hold off
This code throws the following result
As expected, the data follows a normal distribution, but for this example I know that the data follows a normal distribution. If I have raw data as the vector 'Temperatures_Countries_Democracies.Data' (please find the needed files here), which are the temperatures in 1913 of the countries that have democracy, how can I proof that the mean of that raw data follows a normal distribution, i.e, proving the CLT, without knowing the data distribution?
You can get that raw data executing the following script:
datetime.setDefaultFormats('default','yyyy-MM-dd');
T = readtable('GlobalLandTemperaturesByCountry.csv');
T.Properties.VariableNames = {'Date' 'Data' 'AverageTemperatureUncertainty' 'Country'};
T = clean_table(T);
Countries_Democracies = readtable('Full_Flawed_Democracy.csv');
Temperatures_Countries_Democracies = get_data(T,"1913",Countries_Democracies);
Data = Temperatures_Countries_Democracies.Data; % The raw data
% Clearly it does not follows a normal distribution, neither a Poisson, or Exponential...
create_histogram(Temperatures_Countries_Democracies,0,0.5,0,'Expected tmp. Democracies')
This is my best try so far:
% ... continuing the code
n = length(Data);
m = 1e3;
X = normrnd(mean(Data), std(Data), n, m);
X_bar = mean(X);
histogram(X_bar, 'Normalization', 'pdf') % This is normal, but only because I used normrnd(·)

回答(1 个)

Jeff Miller
Jeff Miller 2019-5-5
I'm not sure exactly what you would accept as proving CLT, but you find this helpful:
X_bar = bootstrp(1000,@mean,Data);
histogram(X_bar,'Normalization', 'pdf');
The bootstrp function will produce 1000 different means by sampling the original data with replacement, and the histogram will show that these means look a lot more normal than the original scores.

类别

Help CenterFile Exchange 中查找有关 Verification, Validation, and Test 的更多信息

标签

产品

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by