if it is greater than 75 percent missing data
3 次查看(过去 30 天)
显示 更早的评论
Hello there
I have a matrix of (10200x7) from 01-01-2019 to 29-02-2020, which is iqual to the hourly temperature measured. The data is organizaed in the first 6 columns with date information (yy-mm-dd-hh-mm-ss) and then my temperature data (column 7). I already got the nanmean dayli values for a new matrix. But I would like to know how can I get a conditional, for example: if more than 75% of my data each 24 hours is missing, give me NaN for that day. If not, give me nanmean value.
I'm trying something like this, but I'm pretty sure I'm on the wrong way
%first I find the hourly values positions after 0 hs and before 23 hs
x = find(files_mat{1,1}(:,5)>= 0 {1,1}(:,5)<=23);
%%then, I'm trying to get the conditional to the tempearture column
if isnan (files_mat{1,1}(x,7))<= 0.75
M_24h
elseif isnan (files_mat{1,1}(x,7))>= 0.75
M_24h
Thank you
2 个评论
Rik
2020-10-26
I don't fully understand the shape of your data. But if you have an array, you can call isnan and get out a logical array. Using mean you can find the percentage of NaNs in each row, and using the comparator you will end up with a logical vector you can use to index into your output array and correct all non-NaNs.
I'm not posting this as an answer, as I can't really write a working example for you without understanding your data.
采纳的回答
Peter Perkins
2020-11-19
I think this would be much simpler if you created a timetable by combining the first 6 vars in the table. Actually, I have no idea if you even have a table, but you should. Use readtable to read in your data.
Given a timetable, this is a one-liner: use retime to aggregate to daily means, except use your own custom function that returns NaN if there's more than 18 missing hourly values. Something like this:
ttDaily = retime(ttHourly,'daily',myFun)
function y = myFun(x)
if sum(isnan(x)) > .75*length(x)
y = NaN;
else
y = mean(x,'omitnan');
end
0 个评论
更多回答(1 个)
Adam Danz
2020-10-26
编辑:Adam Danz
2020-10-27
Here's an approach you can apply to your data,
% Create a table of year-month-day-hour-min-second-rain data with some of the hour data missing (NaN)
dateTimeVals = datetime(1999,1,1) + hours(1:5000)';
rain = rand(size(dateTimeVals));
T = array2table([datevec(dateTimeVals), rain], 'VariableNames',{'Y','M','D','H','Mn','S','rain'});
% Replace some Hour data with missing vals
rng('default')
T.H(randi(height(T),1,3000)) = NaN;
% Show top of table
head(T)
% Add column to identify valid hour-data
T.dateValid = ~isnan(T.H);
head(T)
% Compute the percentage of valid hour-data per day
% This is achieved by applying mean() to the binary
% 'dateValid' column for each day.
Tgs = groupsummary(T,{'Y','M','D'},'mean', 'dateValid');
head(Tgs)
% This shows that for the first day of the first month of 1999,
% 47.8% of hour-data are non-nan. Let's confirm that manually.
daySelection = T.Y==1999 & T.M==1 & T.D==1;
mean(T.dateValid(daySelection),'omitnan') * 100
% To identify which days have >75% hour-data,
Tgs(Tgs.mean_dateValid > 0.75, 1:3)
0 个评论
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 Data Preprocessing 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!