How to fill in NaNs or <undefined> in data with the mode of each column
2 次查看(过去 30 天)
显示 更早的评论
I have converted a mixed table of both categorical and double arrays into being all columns of type double, via making each category in the categorical arrays a double.
I have a table of 40k rows, and 40 columns. I want to fill in NaNs via replacing each NaN value with the mode value for that column.
I found a clear looping method in R via this link , but couldn't find a simple loop in matlab to do it. inpaint_nans seems to be more focused on interpolation of the data.
knnimpute()
also fails because I can have swathes of up to 1000 rows which are all NaNs (so I need 1200+ neighbours), as well as 40+ columns, so the algorithm has to loop through 40! times which is very slow.
Any ideas?
0 个评论
回答(1 个)
jgg
2015-12-22
编辑:jgg
2015-12-22
Select the NaNs and set them to things:
A = [1 2 NaN 4 5; 1 2 3 NaN 5; 1 NaN NaN NaN 5];
m = mode(A,1);
m = repmat(m,size(A,2), 1);
A_f = A;
A_f(isnan(A)) = m(isnan(A));
Looping is not necessary if you use vectorized operations.
Note: if your matrix is very large, the repmat step can be replaced with a for loop over the columns in order to use less memory, but 40k by 40 is not that large, so it should be fine.
2 个评论
jgg
2015-12-22
If you liked this answer, please accept it so other people can see it resolved your problem!
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 Data Distribution Plots 的更多信息
产品
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!