- Load in a given table (say jj=1)
- create TT2{1}
- find the min of TT{1}.temperature
- add a third column to TT{1} containing this min in every row
- repeat for mean, max
- repeat for TT{2:34}
How to find the min, max and mean values of 34 timetables stored in a 1 x 34 cell and add them as extra columns to the respective timetables?
11 次查看(过去 30 天)
显示 更早的评论
I have 34 CSV files that each consist of N rows and 3 columns. My code loops through one CSV file at a time (storing them in a 1 x 34 cell) and firstly converts them into a N x 2 timetable, TT{jj}. I then filter this timetable by a timerange (the difference between two dates and times in format dd/MM/uuuu HH:mm) to leave all the values within that timerange which is stored in TT2{jj}. The timetables have 2 columns (date and time, temperature).
I now want to know how to find the min, max and mean values of the temperature column for each timetable and how to add these values to the original timetable to create a N x 5 timetable (date and time, temperature, min temp, max temp, mean temp) which would look like this:
e.g. for jj=1, the final table would look like:
Date/Time Temperature Min Max Mean
21/02/2020 08:00 20 16 20 18.25
21/02/2020 08:03 16
21/02/2020 08:06 18
21/02/2020 08:09 19
etc etc
Then it would loop again for jj=2 etc
The loop to create TT2{jj} is:
for jj = 1:34
thisfile{jj} = files{jj}.name; % creates a cell containing each CSV file name in directory
T{jj} = readtable(thisfile{jj},'Headerlines',19,'ReadVariableNames',true); % converts CSV to timetable ignoring rows 1-19
TT{jj} = table2timetable(T{jj}(:,[1 3])); % convert table to timetable and ignore column 2
TT2{jj} = TT{jj}(TR,:); % creates timetable containing all rows found within timerange TR
end
At the end of the loop (above) I export the 34 tables to one spreadsheet that has 34 tabs.
I have no idea how to proceed so any help would be appreciated.
0 个评论
回答(3 个)
Sindar
2020-4-30
Is this what you want to do:
If so, then:
for jj = 1:34
thisfile{jj} = files{jj}.name; % creates a cell containing each CSV file name in directory
T{jj} = readtable(thisfile{jj},'Headerlines',19,'ReadVariableNames',true); % converts CSV to timetable ignoring rows 1-19
TT{jj} = table2timetable(T{jj}(:,[1 3])); % convert table to timetable and ignore column 2
TT2{jj} = TT{jj}(TR,:); % creates timetable containing all rows found within timerange TR
TT2{jj}.min_T=repelem(min(TT2{jj}{:,2}), size(TT2{jj},1), 1);
TT2{jj}.max_T=repelem(max(TT2{jj}{:,2}), size(TT2{jj},1), 1);
TT2{jj}.mean_T=repelem(mean(TT2{jj}{:,2}), size(TT2{jj},1), 1);
end
If you want to modify T (or likewise TT):
...
T{jj}.min_T=repelem(min(TT2{jj}{:,2}), size(T{jj},1), 1);
...
4 个评论
Sindar
2020-5-13
if you only need the min/etc. in the xls file, but not in Matlab, you could print the 2-column table first then add the summary values in a second print statement:
for jj = 1:34
thisfile{jj} = files{jj}.name; % creates a cell containing each CSV file name in directory
T{jj} = readtable(thisfile{jj},'Headerlines',19,'ReadVariableNames',true); % converts CSV to timetable ignoring rows 1-19
TT{jj} = table2timetable(T{jj}(:,[1 3])); % convert table to timetable and ignore column 2
TT2{jj} = TT{jj}(TR,:); % creates timetable containing all rows found within timerange TR
tmp_table = table(min(TT2{jj}{:,2},max(TT2{jj}{:,2},mean(TT2{jj}{:,2},'VariableNames',{'min';'max';'mean'})
writetimetable(TT2{jj},"table"+jj+".xls")
writetable(tmp_table,"table"+jj+".xls",'Range','C1:D2')
end
Guillaume
2020-5-1
The simplest thing would be to add one column to each timetable to indicate the timetable of origin, then concatenate all these timetables into one timetable. Then with just one call to groupsummary, you can get your desired output.
However, if you get the mean, min and max for each timetable, so get one scalar value for each stat per timetable, I'm a bit unclear why you still want to store a datetime. Doesn't it become meaningless?
Anyway:
TT = cell(size(files));
for fileidx = 1:numel(files)
t = readtable(files(fileidx).name, 'Headerlines', 19, 'ReadVariableNames', true);
TT{fileidx} = table2timetable(t(:, [1, 3]))
TT{fileidx}.FileIndex(:) = fileidx; %add column with file number
end
alltimetables = vertcat(TT{:}); %concatenate all in one timetable
alltimetables = alltimetables(TR, :); %keep only desired timerange
ttstats = groupsummary(alltimetables, 'FileIndex', {'mean', 'min', 'max'}) %get mean min max for each FileIndex
Note that if you're using sufficiently recent version of matlab I'd replace the loop by:
TT = cell(size(files));
opts = detectImportOptions(files(1).name, 'NumHeaderLines', 19, 'ReadVariableNames', true);
opts.SelectedVariableNames = [1, 3]; %don't bother reading 2nd column
for fileidx = 1:numel(files)
TT{fileidx} = readtimetable(files(fileidx).name, opts);
TT{fileidx}.FileIndex(:) = fileidx; %add column with file number
end
%rest of code stays the same
Peter Perkins
2020-5-5
As others have said, it seems to make little sense to create new variables in each timetable, each of which are a column vector of a constant. Maybe you want something like the following.
First, make something like your data:
n = 3;
tt_list = cell(n,1);
for i = 1:3
X = rand(5,1);
Time = datetime(2020,5,i)+days(rand(5,1));
tt_list{i} = timetable(Time,X);
end
Now get the stats for each timetable, and put those in a table that also includes your cell array of timetables:
t = table(tt_list,zeros(n,1),zeros(n,1),zeros(n,1),'VariableNames',["Data" "Mean" "Min" "Max"]);
for i = 1:n
t.Mean(i) = mean(tt_list{i}.X);
t.Min(i) = min(tt_list{i}.X);
t.Max(i) = max(tt_list{i}.X);
end
From that, you end up with
>> t
t =
3×4 table
Data Mean Min Max
_______________ _______ _______ _______
{5×1 timetable} 0.67375 0.4607 0.94475
{5×1 timetable} 0.56289 0.15039 0.9865
{5×1 timetable} 0.52956 0.26661 0.91785
That's the brute force way. As Guillaume suggests, you might find it convenient to put all your timetables in one longer one. The following gets you essentially yhe same table as above.
tt = vertcat(tt_list{:});
tt.Source = repelem(1:n,5)';
fun = @(x) deal(mean(x),min(x),max(x));
t = rowfun(fun,tt,'GroupingVariable','Source','NumOutputs',3, ...
'OutputFormat','table','OutputVariableNames',["Mean" "Min" "Max"])
>> t =
3×5 table
Source GroupCount Mean Min Max
______ __________ _______ _______ _______
1 5 0.67375 0.4607 0.94475
2 5 0.56289 0.15039 0.9865
3 5 0.52956 0.26661 0.91785
I used rowfun; splitapply or groupsummary would also work.
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 Data Type Conversion 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!