Loop to replace outliers with NaN extremely slow
6 次查看(过去 30 天)
显示 更早的评论
I want to replace outliers with NaN in a large table (> 3 standard deviations from each column's mean) and my code works in principle but is incredibly slow, i.e. still not done after 10 minutes. The table size is about 2000x150. Is there a faster way, maybe without the loop, and could someone tell me what is wrong with my version?
%Version 1: loop through column names
var_list = mytable.Properties.VariableNames(4:140)
for i = 1:length(var_list)
mytable.(var_list{i}) = filloutliers(mytable.(var_list{i}),nan,'mean','ThresholdFactor', 3)
end
%Version 2: loop through column indices
for i = 4:140
mytable(:,i) = filloutliers(mytable(:,i),nan,'mean','ThresholdFactor', 3)
end
2 个评论
Mathieu NOE
2022-1-21
hello Tanja
just a question : is removing the outliers the "real" need or a smoothing approach would also fit your needs ?
采纳的回答
Star Strider
2022-1-21
The way the table addressing is coded is likely the problem.
I’m not certain, however using parentheses () addresses the table (or variables as individual table arrays), while curly braces {} address the variable contents themselves.
So for example
mytable(:,i) =
creates a new table as ‘mytable’ while
mytable{:,i} =
addresses only the contents of the variable.
Again, I’m not certain wht the problem is, however experimenting with changing the addressing method could provide a solution.
Also, I’m not certain if the loop is even necessary, since filloutliers appears to work on arrays as well as vectors, and operates on each column separately, according to the documentation.
.
4 个评论
更多回答(0 个)
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 Logical 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!