Treat and handle missing hourly data (with daily profile), that might have large gaps

Question

Anwaar Alghamdi 2022-11-24

0
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/1861273-treat-and-handle-missing-hourly-data-with-daily-profile-that-might-have-large-gaps

编辑： Anwaar Alghamdi 2022-11-24

I want to treat huge missy temperature data with many missing values (presented as 999.9).

If there is few missing data within the day, I would take average from data before and after. But if I have large missing clusters (almost full-day missing, or up to 100 values in a row), I would take average of 1PM temperature from yesterday and 1PM temperature from tomorrow to get 1PM value for today, and same goes for all hours.

Note: I don't wish to change valid assigned tempratures linked to hours (like what interp1 would do with values order).

What can I use to handle these data?

08/09/2016 	4:00:00	 26
08/09/2016 	5:00:00	 26
08/09/2016 	6:00:00	 25
08/09/2016 	6:00:00	 999.9
08/09/2016 	7:00:00	 24
08/09/2016 	8:00:00	 25
08/09/2016 	9:00:00	 24
08/09/2016 	9:00:00	 999.9
08/09/2016 	10:00:00 23

5 个评论
显示 3更早的评论隐藏 3更早的评论

Anwaar Alghamdi 2022-11-24

@Jiri Hajek

Also, if I do linear interpolation, the non-999 values will be missed up (at least their order). I don't want to touch the temperatures assigned for each hour. Only estimate the 999 values.

Jiri Hajek 2022-11-24

As for the cluster identification, I can give you some hints - will put them below into an answer. As for the handling of large missing clusters, I would leave themo out, i.e. constrain the scope.

请先登录，再进行评论。

请先登录，再回答此问题。

Answer 1

Jiri Hajek 2022-11-24

0
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/1861273-treat-and-handle-missing-hourly-data-with-daily-profile-that-might-have-large-gaps#answer_1110293

在 MATLAB Online 中打开

To identify the clusters of outliers, one may use logical indexing and the time vector. This is just a skeletal draft of the algorithm, but you can get the idea.

timeColumn  % your datatime values
temperatureColumnRaw % your original temperatures
outlierPoints = temperatureColumnRaw > 900;
outlierTimes = timeColumn(outlierPoints);
timeDifsOfOutliers = diff(outlierTimes);
clusterStartsLogical = [1; timeDifsOfOutliers > mode(diff(timeColumn))];
clusterStartTimes = outlierTimes(clusterStartsLogical);
nClusters = length(clusterStart); 
if nClusters > 1
    clusterStartIndices = find(clusterStartsLogical);
    clusterEndPoints = [clusterStartIndices(2:end)-1;length(outlierTimes)];
    clusterEndTimes = outlierTimes(clusterEndPoints);
end
clusterDurations = clusterEndTimes-clusterStartTimes;
shortClusterIndices = clusterDurations > hours(3);   % you define, what is a short cluster

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

Treat and handle missing hourly data (with daily profile), that might have large gaps

5 个评论
显示 3更早的评论隐藏 3更早的评论

回答（1 个）

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

另请参阅

类别

标签

Community Treasure Hunt

Treat and handle missing hourly data (with daily profile), that might have large gaps

5 个评论 显示 3更早的评论隐藏 3更早的评论

回答（1 个）

0 个评论 显示 -2更早的评论隐藏 -2更早的评论

另请参阅

类别

标签

Community Treasure Hunt

5 个评论
显示 3更早的评论隐藏 3更早的评论

0 个评论
显示 -2更早的评论隐藏 -2更早的评论