Fieldname
显示 更早的评论
Hello Matlab experts,
I have a doubt on calling the field names. I have a program, and it needs to call fopts =fieldnames(options). I already have the "options" defined. But i am having an error.
function [X, M, C, Xerr] = regem(X, options)
%REGEM Imputation of missing values with regularized EM algorithm.
%
% [X, M, C, Xerr] = REGEM(X, OPTIONS) replaces missing values
% (NaNs) in the data matrix X with imputed values. REGEM
% returns
%
% X, the data matrix with imputed values substituted for NaNs,
% M, the estimated mean of X,
% C, the estimated covariance matrix of X,
% Xerr, an estimated standard error of the imputed values.
%
% Missing values are imputed with a regularized expectation
% maximization (EM) algorithm. In an iteration of the EM algorithm,
% given estimates of the mean and of the covariance matrix are
% revised in three steps. First, for each record X(i,:) with
% missing values, the regression parameters of the variables with
% missing values on the variables with available values are
% computed from the estimates of the mean and of the covariance
% matrix. Second, the missing values in a record X(i,:) are filled
% in with their conditional expectation values given the available
% values and the estimates of the mean and of the covariance
% matrix, the conditional expectation values being the product of
% the available values and the estimated regression
% coefficients. Third, the mean and the covariance matrix are
% re-estimated, the mean as the sample mean of the completed
% dataset and the covariance matrix as the sum of the sample
% covariance matrix of the completed dataset and an estimate of the
% conditional covariance matrix of the imputation error.
%
% In the regularized EM algorithm, the parameters of the regression
% models are estimated by a regularized regression method. By
% default, the parameters of the regression models are estimated by
% an individual ridge regression for each missing value in a
% record, with one regularization parameter (ridge parameter) per
% missing value. Optionally, the parameters of the regression
% models can be estimated by a multiple ridge regression for each
% record with missing values, with one regularization parameter per
% record with missing values. The regularization parameters for the
% ridge regressions are selected as the minimizers of the
% generalized cross-validation (GCV) function. As another option,
% the parameters of the regression models can be estimated by
% truncated total least squares. The truncation parameter, a
% discrete regularization parameter, is fixed and must be given as
% an input argument. The regularized EM algorithm with truncated
% total least squares is faster than the regularized EM algorithm
% with with ridge regression, requiring only one eigendecomposition
% per iteration instead of one eigendecomposition per record and
% iteration. But an adaptive choice of truncation parameter has not
% been implemented for truncated total least squares. So the
% truncated total least squares regressions can be used to compute
% initial values for EM iterations with ridge regressions, in which
% the regularization parameter is chosen adaptively.
%
% As default initial condition for the imputation algorithm, the
% mean of the data is computed from the available values, mean
% values are filled in for missing values, and a covariance matrix
% is estimated as the sample covariance matrix of the completed
% dataset with mean values substituted for missing
% values. Optionally, initial estimates for the missing values and
% for the covariance matrix estimate can be given as input
% arguments.
%
% The OPTIONS structure specifies parameters in the algorithm:
%
% Field name Parameter Default
%
% OPTIONS.regress Regression procedure to be used: 'mridge'
% 'mridge': multiple ridge regression
% 'iridge': individual ridge regressions
% 'ttls': truncated total least squares
% regression
%
% OPTIONS.stagtol Stagnation tolerance: quit when 5e-3
% consecutive iterates of the missing
% values are so close that
% norm( Xmis(it)-Xmis(it-1) )
% <= stagtol * norm( Xmis(it-1) )
%
% OPTIONS.maxit Maximum number of EM iterations. 30
%
% OPTIONS.inflation Inflation factor for the residual 1
% covariance matrix. Because of the
% regularization, the residual covariance
% matrix underestimates the conditional
% covariance matrix of the imputation
% error. The inflation factor is to correct
% this underestimation. The update of the
% covariance matrix estimate is computed
% with residual covariance matrices
% inflated by the factor OPTIONS.inflation,
% and the estimates of the imputation error
% are inflated by the same factor.
%
% OPTIONS.disp Diagnostic output of algorithm. Set to 1
% zero for no diagnostic output.
%
% OPTIONS.regpar Regularization parameter. not set
% For ridge regression, set regpar to
% sqrt(eps) for mild regularization; leave
% regpar unset for GCV selection of
% regularization parameters.
% For TTLS regression, regpar must be set
% and is a fixed truncation parameter.
%
% OPTIONS.relvar_res Minimum relative variance of residuals. 5e-2
% From the parameter OPTIONS.relvar_res, a
% lower bound for the regularization
% parameter is constructed, in order to
% prevent GCV from erroneously choosing
% too small a regularization parameter.
%
% OPTIONS.minvarfrac Minimum fraction of total variation in 0
% standardized variables that must be
% retained in the regularization.
% From the parameter OPTIONS.minvarfrac,
% an approximate upper bound for the
% regularization parameter is constructed.
% The default value OPTIONS.minvarfrac = 0
% essentially corresponds to no upper bound
% for the regularization parameter.
%
% OPTIONS.Xmis0 Initial imputed values. Xmis0 is a not set
% (possibly sparse) matrix of the same
% size as X with initial guesses in place
% of the NaNs in X.
%
% OPTIONS.C0 Initial estimate of covariance matrix. not set
% If no initial covariance matrix C0 is
% given but initial estimates Xmis0 of the
% missing values are given, the sample
% covariance matrix of the dataset
% completed with initial imputed values is
% taken as an initial estimate of the
% covariance matrix.
%
% OPTIONS.Xcmp Display the weighted rms difference not set
% between the imputed values and the
% values given in Xcmp, a matrix of the
% same size as X but without missing
% values. By default, REGEM displays
% the rms difference between the imputed
% values at consecutive iterations. The
% option of displaying the difference
% between the imputed values and reference
% values exists for testing purposes.
%
% OPTIONS.neigs Number of eigenvalue-eigenvector pairs not set
% to be computed for TTLS regression.
% By default, all nonzero eigenvalues and
% corresponding eigenvectors are computed.
% By computing fewer (neigs) eigenvectors,
% the computations can be accelerated, but
% the residual covariance matrices become
% inaccurate. Consequently, the residual
% covariance matrices underestimate the
% imputation error conditional covariance
% matrices more and more as neigs is
% decreased.
% References:
% [1] T. Schneider, 2001: Analysis of incomplete climate data:
% Estimation of mean values and covariance matrices and
% imputation of missing values. Journal of Climate, 14,
% 853--871.
% [2] R. J. A. Little and D. B. Rubin, 1987: Statistical
% Analysis with Missing Data. Wiley Series in Probability
% and Mathematical Statistics. (For EM algorithm.)
% [3] P. C. Hansen, 1997: Rank-Deficient and Discrete Ill-Posed
% Problems: Numerical Aspects of Linear Inversion. SIAM
% Monographs on Mathematical Modeling and Computation.
% (For regularization techniques, including the selection of
% regularization parameters.)
%if plo; set(gcf,'Double','on'); end
[J, B]=xlsread('Data_ChinaJustin1.xls');%%%%%Reading the input data. A contains numerical values and B contains text values
A1(:,:,1)=J(2:101,6:45); %%%%%%%%%%%%%%storing required values from input data to the array A1, which contains 100 responses for each of 40 questions
ARow=A1;
for i=1:100
for j=1:40
if ARow(i,j)==-9
ARow(i,j)=NaN;
end
end
end
X=ARow
options='regress';
% if nargin<2
% options=1;
% end
% error(nargchk(1, 2, nargin)) % check number of input arguments
if ndims(X) > 2, error('X must be vector or 2-D array.'); end
% if X is a vector, make sure it is a column vector (a single variable)
if length(X)==prod(size(X))
X = X(:);
end
[n, p] = size(X);
% number of degrees of freedom for estimation of covariance matrix
dofC = n - 1; % use degrees of freedom correction
optreg = [];
% ============== process options ========================
if nargin ==1 || isempty(options)
fopts = [];
else
fopts = fieldnames(options);
end
% initialize options structure for regression modules
optreg = [];
if strmatch('regress', fopts)
regress = lower(options.regress);
switch regress
case {'mridge', 'iridge'}
% OK
case {'ttls'}
if isempty(strmatch('regpar', fopts))
error('Truncation parameter for TTLS regression must be given.')
else
trunc = min([options.regpar, n-1, p]);
end
if strmatch('neigs', fopts)
neigs = options.neigs;
else
neigs = min(n-1, p);
end
otherwise
error(['Unknown regression method ', regress])
end
else
regress = 'mridge';
end
if strmatch('stagtol', fopts)
stagtol = options.stagtol;
else
stagtol = 5e-3;
end
if strmatch('maxit', fopts)
maxit = options.maxit;
else
maxit = 30;
end
if strmatch('inflation', fopts)
inflation = options.inflation;
else
inflation = 1;
end
if strmatch('relvar_res', fopts)
optreg.relvar_res = options.relvar_res;
else
optreg.relvar_res = 5e-2;
end
if strmatch('minvarfrac', fopts)
optreg.minvarfrac = options.minvarfrac;
else
optreg.minvarfrac = 0;
end
h_given = 0;
if strmatch('regpar', fopts)
h_given = 1;
optreg.regpar = options.regpar;
if strmatch(regress, 'iridge')
regress = 'mridge';
end
end
if strmatch('disp', fopts);
dispon = options.disp;
else
dispon = 1;
end
if strmatch('Xmis0', fopts);
Xmis0_given= 1;
Xmis0 = options.Xmis0;
if any(size(Xmis0) ~= [n,p])
error('OPTIONS.Xmis0 must have the same size as X.')
end
else
Xmis0_given= 0;
end
if strmatch('C0', fopts);
C0_given = 1;
C0 = options.C0;
if any(size(C0) ~= [p, p])
error('OPTIONS.C0 has size incompatible with X.')
end
else
C0_given = 0;
end
if strmatch('Xcmp', fopts);
Xcmp_given = 1;
Xcmp = options.Xcmp;
if any(size(Xcmp) ~= [n,p])
error('OPTIONS.Xcmp must have the same size as X.')
end
sXcmp = std(Xcmp);
else
Xcmp_given = 0;
end
% ================= end options =========================
% get indices of missing values and initialize matrix of imputed values
indmis = find(isnan(X));
nmis = length(indmis);
if nmis == 0
warning('No missing value flags found.')
return % no missing values
end
[jmis,kmis] = ind2sub([n, p], indmis);
Xmis = sparse(jmis, kmis, NaN, n, p); % matrix of imputed values
Xerr = sparse(jmis, kmis, Inf, n, p); % standard error imputed vals.
% for each row of X, assemble the column indices of the available
% values and of the missing values
kavlr = cell(n,1);
kmisr = cell(n,1);
for j=1:n
kavlr{j} = find(~isnan(X(j,:)));
kmisr{j} = find(isnan(X(j,:)));
end
if dispon
disp(sprintf('\nREGEM:'))
disp(sprintf('\tPercentage of values missing: %5.2f', nmis/(n*p)*100))
disp(sprintf('\tStagnation tolerance: %9.2e', stagtol))
disp(sprintf('\tMaximum number of iterations: %3i', maxit))
if (inflation ~= 1)
disp(sprintf('\tResidual (co-)variance inflation: %6.3f ', inflation))
end
if Xmis0_given & C0_given
disp(sprintf(['\tInitialization with given imputed values and' ...
' covariance matrix.']))
elseif C0_given
disp(sprintf(['\tInitialization with given covariance' ...
' matrix.']))
elseif Xmis0_given
disp(sprintf(['\tInitialization with given imputed values.']))
else
disp(sprintf('\tInitialization of missing values by mean substitution.'))
end
switch regress
case 'mridge'
disp(sprintf('\tOne multiple ridge regression per record:'))
disp(sprintf('\t==> one regularization parameter per record.'))
case 'iridge'
disp(sprintf('\tOne individual ridge regression per missing value:'))
disp(sprintf('\t==> one regularization parameter per missing value.'))
case 'ttls'
disp(sprintf('\tOne total least squares regression per record.'))
disp(sprintf('\tFixed truncation parameter: %4i', trunc))
end
if h_given
disp(sprintf('\tFixed regularization parameter: %9.2e', optreg.regpar))
end
if Xcmp_given
disp(sprintf(['\n\tIter \tmean(peff) \t|X-Xcmp|/std(Xcmp) ' ...
'\t|D(Xmis)|/|Xmis|']))
else
disp(sprintf(['\n\tIter \tmean(peff) \t|D(Xmis)| ' ...
'\t|D(Xmis)|/|Xmis|']))
end
end
% initial estimates of missing values
if Xmis0_given
% substitute given guesses for missing values
X(indmis) = Xmis0(indmis);
[X, M] = center(X); % center data to mean zero
else
[X, M] = center(X); % center data to mean zero
X(indmis) = zeros(nmis, 1); % fill missing entries with zeros
end
if C0_given
C = C0;
else
C = X'*X / dofC; % initial estimate of covariance matrix
end
it = 0;
rdXmis = Inf;
while (it < maxit & rdXmis > stagtol)
it = it + 1;
% initialize for this iteration ...
CovRes = zeros(p,p); % ... residual covariance matrix
peff_ave = 0; % ... average effective number of variables
% scale variables to unit variance
D = sqrt(diag(C));
const = (abs(D) < eps); % test for constant variables
nconst = ~const;
if sum(const) ~= 0 % do not scale constant variables
D = D .* nconst + 1*const;
end
X = X ./ repmat(D', n, 1);
% correlation matrix
C = C ./ repmat(D', p, 1) ./ repmat(D, 1, p);
if strmatch(regress, 'ttls')
% compute eigendecomposition of correlation matrix
[V, d] = peigs(C, neigs);
peff_ave = trunc;
end
for j=1:n % cycle over records
pm = length(kmisr{j}); % number of missing values in this record
if pm > 0
pa = p - pm; % number of available values in this record
% regression of missing variables on available variables
switch regress
case 'mridge'
% one multiple ridge regression per record
[B, S, h, peff] = mridge(C(kavlr{j},kavlr{j}), ...
C(kmisr{j},kmisr{j}), ...
C(kavlr{j},kmisr{j}), n-1, optreg);
peff_ave = peff_ave + peff*pm/nmis; % add up eff. number of variables
dofS = dofC - peff; % residual degrees of freedom
% inflation of residual covariance matrix
S = inflation * S;
% bias-corrected estimate of standard error in imputed values
Xerr(j, kmisr{j}) = dofC/dofS * sqrt(diag(S))';
case 'iridge'
% one individual ridge regression per missing value in this record
[B, S, h, peff] = iridge(C(kavlr{j},kavlr{j}), ...
C(kmisr{j},kmisr{j}), ...
C(kavlr{j},kmisr{j}), n-1, optreg);
peff_ave = peff_ave + sum(peff)/nmis; % add up eff. number of variables
dofS = dofC - peff; % residual degrees of freedom
% inflation of residual covariance matrix
S = inflation * S;
% bias-corrected estimate of standard error in imputed values
Xerr(j, kmisr{j}) = ( dofC * sqrt(diag(S)) ./ dofS)';
case 'ttls'
% truncated total least squares with fixed truncation parameter
[B, S] = pttls(V, d, kavlr{j}, kmisr{j}, trunc);
dofS = dofC - trunc; % residual degrees of freedom
% inflation of residual covariance matrix
S = inflation * S;
% bias-corrected estimate of standard error in imputed values
Xerr(j, kmisr{j}) = dofC/dofS * sqrt(diag(S))';
end
% missing value estimates
Xmis(j, kmisr{j}) = X(j, kavlr{j}) * B;
% add up contribution from residual covariance matrices
CovRes(kmisr{j}, kmisr{j}) = CovRes(kmisr{j}, kmisr{j}) + S;
end
end % loop over records
% rescale variables to original scaling
X = X .* repmat(D', n, 1);
Xerr = Xerr .* repmat(D', n, 1);
Xmis = Xmis .* repmat(D', n, 1);
C = C .* repmat(D', p, 1) .* repmat(D, 1, p);
CovRes = CovRes .* repmat(D', p, 1) .* repmat(D, 1, p);
% rms change of missing values
dXmis = norm(Xmis(indmis) - X(indmis)) / sqrt(nmis);
% relative change of missing values
nXmis_pre = norm(X(indmis) + M(kmis)') / sqrt(nmis);
if nXmis_pre < eps
rdXmis = Inf;
else
rdXmis = dXmis / nXmis_pre;
end
% update data matrix X
X(indmis) = Xmis(indmis);
% re-center data and update mean
[X, Mup] = center(X); % re-center data
M = M + Mup; % updated mean vector
% update covariance matrix estimate
C = (X'*X + CovRes)/dofC;
if dispon
if Xcmp_given
% imputed values in original scaling
Xmis(indmis) = X(indmis) + M(kmis)';
% relative error of imputed values (relative to values in Xcmp)
dXmis = norm( (Xmis(indmis)-Xcmp(indmis))./sXcmp(kmis)' ) ...
/ sqrt(nmis);
disp(sprintf(' \t%3i \t %8.2e \t %10.3e \t %10.3e', ...
it, peff_ave, dXmis, rdXmis))
else
disp(sprintf(' \t%3i \t %8.2e \t%9.3e \t %10.3e', ...
it, peff_ave, dXmis, rdXmis))
end
end % display of diagnostics
end % EM iteration
% add mean to centered data matrix
X = X + repmat(M, n, 1);
As you can see from the program, I need to call the regress. But I am having an error at this line fopts = fieldnames(options); the error is ??? Undefined function or method 'fieldnames' for input arguments of type 'char'. Help me
采纳的回答
更多回答(1 个)
Fangjun Jiang
2011-10-12
function fieldnames() is for structure. If the input argument is not a structure, it will cause the error.
s.a=1;
s.b=2;
fieldnames(s)
a='abc';
fieldnames(a)
4 个评论
manoraaju
2011-10-12
Walter Roberson
2011-10-12
Yup. options is being passed in as a formal parameter, but then there is the line
options='regress';
which overwrites it with a string.
Walter Roberson
2011-10-12
Looking at the code and the way it handles defaults, I would just remove the
options='regress';
line. If no options are passed in then nargin will be 1 and fopts will be assigned [], which will lead to regress being set to 'mridge' which sounds as valid as any default.
manoraaju
2011-10-12
类别
在 帮助中心 和 File Exchange 中查找有关 Nonlinear Regression 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!