knnimpute
Impute missing data using nearest-neighbor method
Syntax
Description
returns imputedData
= knnimpute(data
)imputedData
after replacing NaN
s in the
input data
with the corresponding value from the nearest-neighbor
column. If the corresponding value from the nearest-neighbor column is also
NaN
, the next nearest column is used. The function calculates the
Euclidean distance between observation columns by using only the rows with no
NaN
values. Thus, the data must have at least one row that contains no
NaN
.
replaces imputedData
= knnimpute(data
,k
)NaN
s in Data
with a weighted mean of the
k
nearest-neighbor columns. The weights are inversely proportional to
the distances from the neighboring columns.
uses additional options specified by one or more name-value pair arguments. For example,
imputedData
= knnimpute(data
,k
,Name,Value
)imputedData = knnimpute(data,k,'Distance','mahalanobis')
uses the
Mahalanobis distance to compute the nearest-neighbor columns.
Examples
Input Arguments
Output Arguments
References
[1] Speed, T. (2003). Statistical Analysis of Gene Expression Microarray Data (Chapman & Hall/CRC).
[2] Hastie, T., Tibshirani, R., Sherlock, G., Eisen, M., Brown, P., and Botstein, D. (1999). “Imputing missing data for gene expression arrays”, Technical Report, Division of Biostatistics, Stanford University.
[3] Troyanskaya, O., Cantor, M., Sherlock, G., Brown, P., Hastie, T., Tibshirani, R., Botstein, D., and Altman, R. (2001). Missing value estimation methods for DNA microarrays. Bioinformatics 17(6), 520–525.
Version History
Introduced before R2006a