Would like a script that removes repeat data

2 次查看(过去 30 天)
I'm looking to create a script that removes dates that repeat one after the other. For some reason, the program I used to collect the data does something stupid where they send a prompt twice on the same day, but I only want the program to be sent once. For the repeat dates, I want those dates to be deleted. For example:
Dates_Wrong = ['2/4/21';'2/5/21';'2/5/21';'2/6/21';'2/7/21']
Dates_Wrong = 5×6 char array
'2/4/21' '2/5/21' '2/5/21' '2/6/21' '2/7/21'
You can see here, the 2/5/21 date repeats. I would like to create a script that eliminates that repeat data.
The hard part is that you can't just do unique(x) on the entire dates column because there are different subjects with repeating dates and that is why I'm having trouble. It has to be something where it identifies 2 repeating dates in sequence and removes the more recent date. Here is an example of what our previous dates would look like with the repeat date removed.
Dates_Right = ['2/4/21';'2/5/21';'2/6/21';'2/7/21']
Dates_Right = 4×6 char array
'2/4/21' '2/5/21' '2/6/21' '2/7/21'
This is sort of what I was thinking of doing but I'm not sure if it makes sense
for x=1:length(MorningPrompt.SurveyStartedDate)
if x-1==x %This is where I'm having trouble. I think the rest of the script is fine but I'm not sure how to use this part to account for strings since x isn't the actually string found within that variable
MorningPrompt(x,:) = [];
end
end

采纳的回答

Steven Lord
Steven Lord 2022-9-23
Dates_Wrong = ['2/4/21';'2/5/21';'2/5/21';'2/6/21';'2/7/21']
Dates_Wrong = 5×6 char array
'2/4/21' '2/5/21' '2/5/21' '2/6/21' '2/7/21'
dt = datetime(Dates_Wrong, 'InputFormat', 'M/d/yy')
dt = 5×1 datetime array
04-Feb-2021 05-Feb-2021 05-Feb-2021 06-Feb-2021 07-Feb-2021
differences = diff(dt)
differences = 4×1 duration array
24:00:00 00:00:00 24:00:00 24:00:00
repeated = differences ~= 0
repeated = 4×1 logical array
1 0 1 1
Note that differences and repeated are both one element shorter than dt. Add a true as the first or last element depending on whether you always want to keep the first element or the last.
  1 个评论
BA
BA 2022-9-23
Thank you! This is wonderful.
Just had a few questions.
1) Since I want the first value of each of the repeats, would I just have to set the last line of the logical array to be 1?
2) For the logical indexing, this is the command I'm using. I think it works but its the first time I've used logical indexing so I'm not sure
%My code adapted using your code
Dates = Dataset.Dates;
dt = datetime(Dates, 'InputFormat', 'M/d/yy');
differences = diff(dt);
repeated = differences ~= 0
%Indexing
NonRepeats = Dataset(repeated(:,1)==1, :);

请先登录,再进行评论。

更多回答(0 个)

类别

Help CenterFile Exchange 中查找有关 Time Series Objects 的更多信息

标签

产品


版本

R2022a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by