Apply copulas for estimating a single missing marginal, is it possible?

Question

Barbab 2022-10-26

0
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/1836263-apply-copulas-for-estimating-a-single-missing-marginal-is-it-possible

编辑： Barbab 2023-12-18

Let's consider this example from matlab documentation (with little changes):

load stockreturns
x = stocks(:,1);
y = stocks(:,2);
z = stocks(:,3);
u = ksdensity(x,x,'function','cdf');
v = ksdensity(y,y,'function','cdf');
w = ksdensity(z,z,'function','cdf');
[Rho,nu] = copulafit('t',[u v w],'Method','ApproximateML')
Rho = 3×3
    1.0000    0.7220    0.3652
    0.7220    1.0000    0.3659
    0.3652    0.3659    1.0000
nu = 1.2692e+08

Now, assume that Rho and nu are known. Let's consider (only for simplicity):

v(50)
ans = 0.6546

And

y(50)
ans = 0.3170

And assume that y has a missing observation:

v(50) = NaN;
y(50) = NaN;

How can I estimate the missing marginal v(50) and accordingly the missing observation y(50) knowing Rho, nu, x, y, z and u, v, w? In other terms: how can I impute the value of a missing observations knowing other marginals?

Thank you in advance for your help.

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

请先登录，再回答此问题。

Answer 1

Paras Gupta 2023-12-17

0
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/1836263-apply-copulas-for-estimating-a-single-missing-marginal-is-it-possible#answer_1373017

在 MATLAB Online 中打开

Hi Barbab,

I understand that you want to impute the value of a missing observation knowing other marginals.

To provide an estimate of the missing values, we can use the conditional distribution of the t-copula given the known marginals. The following code illustrates one way to achieve the same.

load stockreturns
x = stocks(:,1);
y = stocks(:,2);
z = stocks(:,3);
u = ksdensity(x,x,'function','cdf');
v = ksdensity(y,y,'function','cdf');
w = ksdensity(z,z,'function','cdf');
[Rho,nu] = copulafit('t',[u v w],'Method','ApproximateML');
% Assuming Rho, nu, x, y, z, u, v, w are known and v(50) and y(50) are missing
% Set the missing values to NaN
v(50) = NaN;
y(50) = NaN;
% Find indices of the non-missing data
nonMissingIdx = ~isnan(y);
% Estimate the CDF values for the non-missing y data
v_nonMissing = ksdensity(y(nonMissingIdx), y(nonMissingIdx), 'function', 'cdf');
% Fit the t-copula to the non-missing data
[Rho_nonMissing, nu_nonMissing] = copulafit('t', [u(nonMissingIdx) v_nonMissing w(nonMissingIdx)], 'Method', 'ApproximateML');
% For the missing observation, use the known values of x and z
known_x = x(50);
known_z = z(50);
% Calculate the CDF values of the known x and z
u_known = ksdensity(x, known_x, 'function', 'cdf');
w_known = ksdensity(z, known_z, 'function', 'cdf');
% Calculate the conditional distribution of y given x and z using the fitted t-copula
conditionalCdf = @(v) copulacdf('t', [u_known v w_known], Rho_nonMissing, nu_nonMissing);
% Find the quantile function (inverse CDF) for the non-missing y data
inv_v_nonMissing = @(p) ksdensity(y(nonMissingIdx), p, 'function', 'icdf');
% Use fminbnd to find the v value that makes the conditional CDF equal to 0.5
% This is a median estimate under the conditional distribution
v_estimate = fminbnd(@(v) abs(conditionalCdf(v) - 0.5), 0, 1);
% Convert the v_estimate to the corresponding y value using the inverse CDF
y_estimate = inv_v_nonMissing(v_estimate);

Please note that this is a simplified approach and assumes that the median of the conditional distribution is a reasonable estimate for the missing value. In practice, you may want to use more sophisticated imputation methods or consider the uncertainty in the estimate by sampling from the conditional distribution multiple times

You can refer to the documentation links below for more information on the code above.

copulacdf - https://www.mathworks.com/help/stats/copulacdf.html
fminbnd - https://www.mathworks.com/help/matlab/ref/fminbnd.html

Hope this helps.

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

Barbab 2023-12-18

编辑：Barbab 2023-12-18

在 MATLAB Online 中打开

Thank for your answer, that was exactly what I was looking for.

If I understand correctly the function conditionalCdf ensures that the dependence structure estimated on the non-missing data is taken into consideration and that the y_estimate value is therefore consistent with the Rho_nonMissing matrix?

When you mention to sample from the conditional distribution, you mean something like this?

rng('default')
% Number of samples to draw from the conditional distribution
numSamples = 1000;
% Preallocate array to store sampled y values
sampled_y_values = NaN(numSamples, 1);
% Perform multiple samples from the conditional distribution
for i = 1:numSamples
    % Sample from the conditional distribution
    v_sample = fminbnd(@(v) abs(conditionalCdf(v) - rand), 0, 1);
    
    % Convert the sampled v value to the corresponding y value
    y_sample = inv_v_nonMissing(v_sample);
    
    % Store the sampled y value
    sampled_y_values(i) = y_sample;
end
% Calculate statistics or analyze the sampled y values as needed
y_estimate = mean(sampled_y_values);
% or
y_estimate = median(sampled_y_values);

Why even if Rho_nonMissing values are relatively high, y_estimate is so different from its "true" value?

y(50), the true value is 0.3170
assuming the median (your code) gives 1.2224
using numerical simulations (my code) gives –0.1749 (mean) or –0.0908 (median)

请先登录，再进行评论。

Apply copulas for estimating a single missing marginal, is it possible?

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

回答（1 个）

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

另请参阅

类别

标签

产品

版本

Community Treasure Hunt

Apply copulas for estimating a single missing marginal, is it possible?

0 个评论 显示 -2更早的评论隐藏 -2更早的评论

回答（1 个）

1 个评论 显示 -1更早的评论隐藏 -1更早的评论

另请参阅

类别

标签

产品

版本

Community Treasure Hunt

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

1 个评论
显示 -1更早的评论隐藏 -1更早的评论