Find max within due date and find max within due date by class section

6 次查看(过去 30 天)
I am able to find the max of the date_submitted. How do I find the max of the date_submitted but within the Lab_due_date column? Also, how do I find the max of the date_submitted within the Lab_due_date by class_section?

采纳的回答

Ayush Modi
Ayush Modi 2024-5-20
编辑:Ayush Modi 2024-5-20
Hi,
To find the latest date_submitted that is before the Lab_due_date for each class_section, you first need to format the data in the two column to be comparable. You can acheive this using datatime function.
data.date_submitted = datetime(data.date_submitted, 'InputFormat', 'MM/dd/yyyy HH:mm');
data.Lab_due_date = datetime(data.Lab_due_date, 'InputFormat', 'MM/dd/yyyy, hh:mm a zzz', 'TimeZone', 'America/Chicago', 'Locale', 'en_US');
data.Lab_due_date.TimeZone = '';% Group by user_id
Then, group the data by the desired category using findgroups function.
grouped = findgroups(data.user_id);
To get the date_submitted that is before the Lab_due_date:
uniqueGroups = unique(data.class_section); % Assuming class_section is a cell array of strings
maxDatesByGroup = NaT(size(uniqueGroups)); % Preallocate with Not-a-Time (NaT)
for i = 1:length(uniqueGroups)
% Filter data for the current group using strcmp for cell array comparison
currentGroupFilter = strcmp(data.class_section, uniqueGroups{i});
currentGroupData = data(currentGroupFilter, :);
% Ensure Lab_due_date and date_submitted are datetime arrays before comparison
validSubmissions = currentGroupData.date_submitted <= currentGroupData.Lab_due_date(1);
% Find the maximum date_submitted that meets the criteria
if any(validSubmissions)
maxDatesByGroup(i) = max(currentGroupData.date_submitted(validSubmissions));
end
end
Please refer to the following MathWorks documentation for more information on:
  3 个评论
Sunshine
Sunshine 2024-5-23
Ok. I updated the Lab_due_date column to a comparable date/time format to date_submitted prior to importing the data to make it easier, I thought.
From there, I followed each of the lines provided. It seemed to work until I got to the for loop.
>> for i = 1:length(uniqueGroups)
% Filter data for the current group using strcmp for cell array comparison
currentGroupFilter = strcmp(data.class_section, uniqueGroups{i});
currentGroupData = data(currentGroupFilter, :);
% Ensure Lab_due_date and date_submitted are datetime arrays before comparison
validSubmissions = currentGroupData.date_submitted <= currentGroupData.Lab_due_date(1);
% Find the maximum date_submitted that meets the criteria
if any(validSubmissions)
maxDatesByGroup(i) = max(currentGroupData.date_submitted(validSubmissions));
end
end
I received the error: Error using indexing
Cell contents reference from a non-cell array object.
So I updated this line: currentGroupFilter = strcmp(data.class_section, uniqueGroups{i});
to use parentheses instead of curly brackets. So now, I have
currentGroupFilter = strcmp(lab31437lowerparticipantscopyupdate.class_section, uniqueGroups(i));
But then I get the error: Index exceeds the number of array elements. Index must not exceed 0.
I'm not quite sure how to fix this. How do I fix this?
Sunshine
Sunshine 2024-5-23
Ok. After working with the code, I see that I needed to make the class_section column a string. It was imported as categorical, once I did that, I am able to follow the code and get the max dates by class_section but within the Lab_due_date. Thank you, thank you, thank you!
So now, how do I automate finding the max of the date_submitted but within the Lab_due_date column, meaning not by class_section but just the overall max date but within the Lab_due_date?

请先登录,再进行评论。

更多回答(1 个)

Peter Perkins
Peter Perkins 2024-5-29
This is much simpler than the accepted answers suggests. And you want to use categorical for categorical data, not text.
%t = readtable("lab3_14_37_lower_participants_copy.csv");
%t.date_submitted.TimeZone = "America/Chicago";
%t.Lab_due_date = datetime(t.Lab_due_date,InputFormat="MM/dd/uuuu, HH:mm a z",TimeZone="America/Chicago")
I'm gonna assume you don't want to work with time zones; your updated file does not contain the CDT offsets.
t = readtable("https://www.mathworks.com/matlabcentral/answers/uploaded_files/1702791/lab3_14_37_lower_participants_copy_update.csv");
t = convertvars(t,["user_id" "email" "class_section"],"categorical");
Your file has exactly one unique due date. Let's add some more data.
n = 50;
t = [t; t; t]
t = 150x8 table
user_id email class_section date_submitted is_submission score max_score Lab_due_date _______ ________ _______________ ________________ _____________ _____ _________ ________________ 226340 Student1 section 001/011 09/18/2023 19:26 1 0 5 09/27/2023 23:59 226340 Student1 section 001/011 09/18/2023 19:27 1 0 5 09/27/2023 23:59 226340 Student1 section 001/011 09/18/2023 19:36 1 0 5 09/27/2023 23:59 226340 Student1 section 001/011 09/18/2023 19:41 1 0 5 09/27/2023 23:59 226340 Student1 section 001/011 09/18/2023 19:48 1 0 5 09/27/2023 23:59 226340 Student1 section 001/011 09/21/2023 17:16 1 0 5 09/27/2023 23:59 226340 Student1 section 001/011 09/21/2023 17:20 1 0 5 09/27/2023 23:59 226340 Student1 section 001/011 09/21/2023 17:31 1 0 5 09/27/2023 23:59 226340 Student1 section 001/011 09/21/2023 17:32 1 0 5 09/27/2023 23:59 226340 Student1 section 001/011 09/21/2023 17:35 1 0 5 09/27/2023 23:59 226340 Student1 section 001/011 09/21/2023 17:38 1 0 5 09/27/2023 23:59 226340 Student1 section 001/011 09/21/2023 17:40 1 0 5 09/27/2023 23:59 1387758 Student2 section 004/014 12/12/2023 05:05 1 5 5 09/27/2023 23:59 1389302 Student3 section 004/014 12/08/2023 01:42 1 5 5 09/27/2023 23:59 1565527 Student4 section 003/013 09/20/2023 17:30 1 0 5 09/27/2023 23:59 1565527 Student4 section 003/013 09/20/2023 17:32 1 0 5 09/27/2023 23:59
t.date_submitted(51:100) = t.date_submitted(51:100) + calmonths(1);
t.Lab_due_date(51:100) = t.Lab_due_date(51:100) + calmonths(1);
t.date_submitted(101:150) = t.date_submitted(101:150) + calmonths(2);
t.Lab_due_date(101:150) = t.Lab_due_date(101:150) + calmonths(2);
Now that that's done, computing the max submit dates is one line. No loops.
groupsummary(t,"Lab_due_date","day","max","date_submitted")
ans = 3x3 table
day_Lab_due_date GroupCount max_date_submitted ________________ __________ __________________ 27-Sep-2023 50 12/12/2023 05:05 27-Oct-2023 50 01/12/2024 05:05 27-Nov-2023 50 02/12/2024 05:05
With two grouping vars, still one line.
groupsummary(t,["Lab_due_date" "class_section"],["day" 'none'],"max","date_submitted")
ans = 15x4 table
day_Lab_due_date class_section GroupCount max_date_submitted ________________ _______________ __________ __________________ 27-Sep-2023 section 001/011 21 09/25/2023 03:58 27-Sep-2023 section 002/012 3 09/19/2023 19:33 27-Sep-2023 section 003/013 14 09/24/2023 22:19 27-Sep-2023 section 004/014 10 12/12/2023 05:05 27-Sep-2023 section 005/015 2 09/25/2023 03:54 27-Oct-2023 section 001/011 21 10/25/2023 03:58 27-Oct-2023 section 002/012 3 10/19/2023 19:33 27-Oct-2023 section 003/013 14 10/24/2023 22:19 27-Oct-2023 section 004/014 10 01/12/2024 05:05 27-Oct-2023 section 005/015 2 10/25/2023 03:54 27-Nov-2023 section 001/011 21 11/25/2023 03:58 27-Nov-2023 section 002/012 3 11/19/2023 19:33 27-Nov-2023 section 003/013 14 11/24/2023 22:19 27-Nov-2023 section 004/014 10 02/12/2024 05:05 27-Nov-2023 section 005/015 2 11/25/2023 03:54
  2 个评论
Sunshine
Sunshine 2024-5-29
编辑:Sunshine 2024-5-29
You are correct. I removed the timezone portion of the data. It just seemed easier to remove it. What you have is great and almost supplies what I am attempting to get. For now, I want to look at all dates submitted within the due date, and only those within the due date. If a date submitted is beyond the due date, I don't want to return that info in the results currently. I want to return the closest submitted date within the due date. How would I exclude data in which the date submitted is beyond the due date but return the next closest date that's within the due date? For instance, in "https://www.mathworks.com/matlabcentral/answers/uploaded_files/1702791/lab3_14_37_lower_participants_copy_update.csv", in section 004/014, the max date submitted is 12-Dec-2023 05:05:00 and the next max date is 08-Dec-2023 01:42:00. Both are beyond the due date of 27-Sep-2023 23:59:00. Just eyeballing the 50 rows in the csv, it looks like 24-Sep-2023 02:27:00 is the max within the due date for section 004/014. That's the max date I want to show for section 004/014. How do I return only the max dates within the due date for the 50 submissions and then by section?
Peter Perkins
Peter Perkins 2024-5-29
doc groupsummary: "You also can specify method as a function handle that returns one output per group whose first dimension has length 1. For table input data, the function operates on each table variable separately."

请先登录,再进行评论。

类别

Help CenterFile Exchange 中查找有关 Dates and Time 的更多信息

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by