Treat and handle missing hourly data (with daily profile), that might have large gaps

1 次查看(过去 30 天)
I want to treat huge missy temperature data with many missing values (presented as 999.9).
If there is few missing data within the day, I would take average from data before and after. But if I have large missing clusters (almost full-day missing, or up to 100 values in a row), I would take average of 1PM temperature from yesterday and 1PM temperature from tomorrow to get 1PM value for today, and same goes for all hours.
Note: I don't wish to change valid assigned tempratures linked to hours (like what interp1 would do with values order).
What can I use to handle these data?
08/09/2016 4:00:00 26
08/09/2016 5:00:00 26
08/09/2016 6:00:00 25
08/09/2016 6:00:00 999.9
08/09/2016 7:00:00 24
08/09/2016 8:00:00 25
08/09/2016 9:00:00 24
08/09/2016 9:00:00 999.9
08/09/2016 10:00:00 23
  5 个评论
Anwaar Alghamdi
Anwaar Alghamdi 2022-11-24
Also, if I do linear interpolation, the non-999 values will be missed up (at least their order). I don't want to touch the temperatures assigned for each hour. Only estimate the 999 values.
Jiri Hajek
Jiri Hajek 2022-11-24
As for the cluster identification, I can give you some hints - will put them below into an answer. As for the handling of large missing clusters, I would leave themo out, i.e. constrain the scope.

请先登录,再进行评论。

回答(1 个)

Jiri Hajek
Jiri Hajek 2022-11-24
To identify the clusters of outliers, one may use logical indexing and the time vector. This is just a skeletal draft of the algorithm, but you can get the idea.
timeColumn % your datatime values
temperatureColumnRaw % your original temperatures
outlierPoints = temperatureColumnRaw > 900;
outlierTimes = timeColumn(outlierPoints);
timeDifsOfOutliers = diff(outlierTimes);
clusterStartsLogical = [1; timeDifsOfOutliers > mode(diff(timeColumn))];
clusterStartTimes = outlierTimes(clusterStartsLogical);
nClusters = length(clusterStart);
if nClusters > 1
clusterStartIndices = find(clusterStartsLogical);
clusterEndPoints = [clusterStartIndices(2:end)-1;length(outlierTimes)];
clusterEndTimes = outlierTimes(clusterEndPoints);
end
clusterDurations = clusterEndTimes-clusterStartTimes;
shortClusterIndices = clusterDurations > hours(3); % you define, what is a short cluster

类别

Help CenterFile Exchange 中查找有关 Call Python from MATLAB 的更多信息

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by