Comparing each row of a table to the same row of a different table. Then store the highest maximum temperature, lowest minimum temperature from that row.
13 次查看(过去 30 天)
显示 更早的评论
I have 3 years of data from 1981 through 1983. Each file is in text format labeled as temp_summary.05._%month_%year.txt. Of each file, has four columns (day, max_t, min_t, and avg_t) with the corresponding number of days for each month (31 days for January and etc.). I need to compare row 1 through x to the same corresponding row of the following years. For example, 1981 January row 1 (day 1) to 1982 January row 1 then to 1983 January row 1. The goal is to retrieve the highest temperature and lowest temperature of those rows then store into a different array. The following code works as it navigates through each file effortlessly while ignoring missing data in the tables. However, it's not retrieving the max temperature of that row but instead storing all maximum and minimum temperatures without the dates that it occurred on. What can I do? Need help.
clear all;
clc;
startyear = 1981;
endyear = 1983;
hightemps = [];
lowtemps = [];
avgtemps = [];
for year = startyear:endyear
for month = 1:12
for day = 1:31
filename = sprintf('temp_summary.05.%02d_%02d.txt', month, year);
data = readtable(filename);
data.Properties.VariableNames([1 2 3 4]) = {'Day' 'Maximum T' 'Minimum T' 'Average T'}
daydata = data(1:31,"Day");
hightemp = max(data(1:31,"Maximum T"));
lowtemp = min(data(1:31,"Minimum T"));
meantemp = mean(data(1:31,"Average T"));
DD = rmmissing(daydata);
HT = rmmissing(hightemp);
LT = rmmissing(lowtemp);
MT = rmmissing(meantemp);
hightemps = [hightemps;HT];
lowtemps = [lowtemps;LT];
avgtemps = [avgtemps;MT];
end
end
end
9 个评论
the cyclist
2023-10-16
Can you upload the data (or at least a small representative sample), so that we can run your code? You can use the paper clip icon in the INSERT section of the toolbar.
Jonathon Klepatzki
2023-10-16
I can't attach the original file however, I'll create 3 text files with random data shortly.
J. Alex Lee
2023-10-17
编辑:J. Alex Lee
2023-10-17
my approach would be to import all the data into a "master" table first - it's possible that you can do what you want within the context of tables, e.g.,
concatenate everything into a single table and add columns for "month" and "year", then you can make your intermediate tables of interest like
startyear = 1981;
endyear = 1983;
data = cell(12*(endyear-startyear+1),1);
cntr = 0;
for yr = startyear:endyear
for mo = 1:12
filename = sprintf('temp_summary.05.%02d_%02d.txt', mo, yr);
cntr = cntr+1;
tmp = readtable(filename);
tmp.Year = yr*ones(height(tmp),1);;
tmp.Month = mo*ones(height(tmp),1);
data{cntr} = tmp
end
end
MT = vertcat(data{:})
MT.Properties.VariableNames = {'Day' 'Maximum T' 'Minimum T' 'Average T','Year','Month'}
MT(MT.Month==1);
Jonathon Klepatzki
2023-10-17
I ran your code and it was nice! Much appreciated! However, if January 1 1981 had a maximum temperature of 55 but, January 1 1983 had a maximum temperature of 57, the 57 must be recorded into a separate array with the year, month, and day that it took place. The goal is to do this for every day of the year throughout 3 years.
When that's done, an average of the highest and lowest temperatures will be done.
J. Alex Lee
2023-10-17
I'm not sure if the new functionality about "pivot" will help once you have the table, with the "max" aggregation function or something...otherwise brute force extracting the rows that you care about might look like
for i = 1:31
for j = 1:12
MT(MT.Day==i & MT.Month==j) % which should return a table of height ~3
% and you can store it in an array or maybe preferably a new table
end
end
Jonathon Klepatzki
2023-10-18
编辑:Jonathon Klepatzki
2023-10-18
Matlab isn't allowing the MT(MT. Month == 1) as it gives me an error "subscripting into a table usng one subscript is not supporter".
I changed it slightly to:
for i = 1:31
for j = 1:12
MT(i,j)
end
end
However, I get the variable index exceeds table dimensions. Would you agree to using the linspace function? If so, how?
Cris LaPierre
2023-10-18
MT is a table with 2 dimensions, so you have to either index using (row,column) notation, or index a specific variable in the table.
MT(MT. Month == 1,"VarName")
% or
MT.VarName(MT.Month == 1)
J. Alex Lee
2023-10-18
My mistake, I was intending to extract whole rows from the table, which needs to be
MT(condition,:)
采纳的回答
Cris LaPierre
2023-10-18
I would use groupsummary.
Tbl1 = readtable("temp_summary.05.01_1981.txt","ConsecutiveDelimitersRule","join");
Warning: Column headers from the file were modified to make them valid MATLAB identifiers before creating variable names for the table. The original column headers are saved in the VariableDescriptions property.
Set 'VariableNamingRule' to 'preserve' to use the original column headers as table variable names.
Set 'VariableNamingRule' to 'preserve' to use the original column headers as table variable names.
Tbl1.Day = datetime(1981,5,Tbl1.Day);
Tbl2 = readtable("temp_summary.05.01_1982.txt","ConsecutiveDelimitersRule","join");
Warning: Column headers from the file were modified to make them valid MATLAB identifiers before creating variable names for the table. The original column headers are saved in the VariableDescriptions property.
Set 'VariableNamingRule' to 'preserve' to use the original column headers as table variable names.
Set 'VariableNamingRule' to 'preserve' to use the original column headers as table variable names.
Tbl2.Day = datetime(1982,5,Tbl2.Day);
Tbl3 = readtable("temp_summary.05.01_1983.txt","ConsecutiveDelimitersRule","join");
Warning: Column headers from the file were modified to make them valid MATLAB identifiers before creating variable names for the table. The original column headers are saved in the VariableDescriptions property.
Set 'VariableNamingRule' to 'preserve' to use the original column headers as table variable names.
Set 'VariableNamingRule' to 'preserve' to use the original column headers as table variable names.
Tbl3.Day = datetime(1983,5,Tbl3.Day)
Tbl3 = 31×4 table
Day MaximumTemperature MinimumTemperature AverageTemperature
___________ __________________ __________________ __________________
01-May-1983 67 10 38.5
02-May-1983 65 24 44
03-May-1983 65 16 40.5
04-May-1983 59 21 40
05-May-1983 53 22 37
06-May-1983 59 16 32.5
07-May-1983 63 21 41
08-May-1983 58 10 32
09-May-1983 62 13 35.5
10-May-1983 65 17 40
11-May-1983 59 24 41
12-May-1983 58 38 48
13-May-1983 63 25 45.5
14-May-1983 65 17 42
15-May-1983 56 17 39
16-May-1983 58 21 42.5
sumTbl = groupsummary([Tbl1;Tbl2;Tbl3],["Day","Day"],["monthofyear","dayofmonth"],["max","min"],["MaximumTemperature" "MinimumTemperature"])
sumTbl = 31×7 table
monthofyear_Day dayofmonth_Day GroupCount max_MaximumTemperature min_MaximumTemperature max_MinimumTemperature min_MinimumTemperature
_______________ ______________ __________ ______________________ ______________________ ______________________ ______________________
5 1 3 67 63 12 10
5 2 3 70 65 28 24
5 3 3 67 65 17 16
5 4 3 61 57 22 21
5 5 3 58 46 24 20
5 6 3 63 59 18 16
5 7 3 63 60 25 21
5 8 3 60 58 20 10
5 9 3 64 62 13 11
5 10 3 65 64 17 12
5 11 3 59 54 999 22
5 12 3 60 58 40 23
5 13 3 66 63 30 25
5 14 3 65 63 19 17
5 15 3 61 56 19 16
5 16 3 64 58 23 21
From there, you can extract the columns (variables) you want. See here: https://www.mathworks.com/help/matlab/matlab_prog/access-data-in-a-table.html
maxTempMay = sumTbl.max_MaximumTemperature
maxTempMay = 31×1
67
70
67
61
58
63
63
60
64
65
minTempMay = sumTbl.min_MinimumTemperature
minTempMay = 31×1
10
24
16
21
20
16
21
10
11
12
93 个评论
Cris LaPierre
2023-10-18
编辑:Cris LaPierre
2023-10-18
Capitalization matters. Make sure your variable names are the same as in the files you shared here. If not, you will need to modify the code so that you know what the variable names will be.
Tbl1.Properties.VariableNames = {'Day' 'MaximumT' 'MinimumT' 'AverageT'}
Or you can use numeric indices intead a variable names.
Tbl1{:,1} = datetime(1981,5,Tbl1{:,1});
sumTbl = groupsummary([Tbl1;Tbl2;Tbl3],[1,1],["monthofyear","dayofmonth"],["max","min"],[2,3])
Jonathon Klepatzki
2023-10-18
That's awesome. However, did you specify Day as the total number of days of the month or no?
Cris LaPierre
2023-10-18
Sorry, I don't understand what you are asking.
I didn't specify anything about the days. It is using the values in your datafile to figure out the days.
Jonathon Klepatzki
2023-10-18
编辑:Jonathon Klepatzki
2023-10-18
Gotcha. I ran your example to test what I have and it's giving me an error. "unrecognized table variable name "day".
The files that I attached wasn't the files that I am trying to run. However, with that said they are similar with the only thing that's different is some text about those columns.
Quick question, could I use sprintf to read the data? I find that worked really well before.
Cris LaPierre
2023-10-18
I'm assuming you have data files for each month? Yes, you can use sprintf to format the file names.
I would use a filedatastore to load all the data into a single variable instead. It avoids looping, and the groupsummary approach would still work. You can see an example of how to use one to do this in this video from the Data Processing with MATLAB specialization on Coursera.
Here is the final code from that example. You can modify this to work for your data.
flightsDataStore = fileDatastore("flights*.csv","ReadFcn",@importFlightsData,"UniformRead",true);
flightsAll = readall(flightsDataStore)
Once complete, all the data from all files matching the pattern "flights*.csv" are loaded into the variable flightsAll.
Jonathon Klepatzki
2023-10-18
Sure do.
Thank you for the video, it helped. I currently have the following code:
%%%%%%%% works beautifully, created a matrix with all files listed in column 1 %%%%%%%%%%
Datafiles = fileDatastore("temp_summary*.txt","ReadFcn",@readtable,"FileExtensions",[".txt"])
dataAll = readall(Datafiles)
%%%%% this next code, was given an error %%%%%%%%%%%
* sumbTbl = groupsummary(dataAll,["Day", "Day"], ["monthofyear", "dayofmonth"], ["max", "min"], ["Maximum Temp" "Minimum Temp"]);
It's giving me an error "Grouping variables and data matrix must have same number of rows". This is expected since some months don't have 31 days. Could I use linspace to make it the same?
Cris LaPierre
2023-10-19
Using this syntax, it should be impossible to get this error. dataAll is a table, and it is grouping on variables in the table. Since it is impossible to have a different number of rows in your table variables, something else must be causing the error.
Can you please share the entire error message (all the red text)?
Jonathon Klepatzki
2023-10-19
编辑:Jonathon Klepatzki
2023-10-19
The remaining error code stated
[groupingData,groupsVars] = matlab.internal.math.parseGroupVars(groupVars,tableFlag,"groupsummary:Group",T);
However, I just checked the matrix and the names of the columns wasn't Max T and etc. They were Var1 through x. So, I tried using the following:
dataAll.Properties.VariableNames
however, the new error states "unable to perform assignment because dot indexing is not supported for variables of this type.
Any suggestions?
Jonathon Klepatzki
2023-10-19
编辑:Jonathon Klepatzki
2023-10-19
The remaining line of codes include:
Datafiles = fileDatastore("temp_summary*.txt","ReadFcn",@readtable,"FileExtensions",[".txt"]);
dataAll = readall(Datafiles)
dataAll.Properties.VariableNames([1 2 3 4]) = {"Day" "Maximum Temp" "Minimum Temp" "Average Temp"};
The result from this was a 1x1 matrix. Which is the problem.
Cris LaPierre
2023-10-19
You are not using the "UniformRead" option, so the results are being returned as a cell instead of a table.
Due to your file format, I would suggest creating a separate import function rather than calling readtable directly. Something like this.
Datafiles = fileDatastore("temp_summary*.txt","ReadFcn",@readMonth,"UniformRead",true);
dataAll = readall(Datafiles);
sumTbl = groupsummary(dataAll,["Day","Day"],["monthofyear","dayofmonth"],["max","min"],["MaximumT" "MinimumT"])
sumTbl = 31×7 table
monthofyear_Day dayofmonth_Day GroupCount max_MaximumT min_MaximumT max_MinimumT min_MinimumT
_______________ ______________ __________ ____________ ____________ ____________ ____________
5 1 3 67 63 12 10
5 2 3 70 65 28 24
5 3 3 67 65 17 16
5 4 3 61 57 22 21
5 5 3 58 46 24 20
5 6 3 63 59 18 16
5 7 3 63 60 25 21
5 8 3 60 58 20 10
5 9 3 64 62 13 11
5 10 3 65 64 17 12
5 11 3 59 54 999 22
5 12 3 60 58 40 23
5 13 3 66 63 30 25
5 14 3 65 63 19 17
5 15 3 61 56 19 16
5 16 3 64 58 23 21
maxTempMay = sumTbl.max_MaximumT
maxTempMay = 31×1
67
70
67
61
58
63
63
60
64
65
minTempMay = sumTbl.min_MinimumT
minTempMay = 31×1
10
24
16
21
20
16
21
10
11
12
function Tbl = readMonth(filename)
Tbl = readtable(filename,"ConsecutiveDelimitersRule","join","ReadVariableNames",false);
Tbl.Properties.VariableNames = {'Day' 'MaximumT' 'MinimumT' 'AverageT'};
% extract year and month from filename
d = str2double(extract(filename,digitsPattern));
% convert day to datetime
Tbl.Day = datetime(d(3),d(1),Tbl.Day);
end
Jonathon Klepatzki
2023-10-19
编辑:Jonathon Klepatzki
2023-10-19
I was getting an error which was the reason why I didn't use the uniformread from before.
Error using fileDatastore
Error using PreviewFcn @readMonth for file:
C:\Users\Jonathan\Documents\MATLAB\New Folder\temp_summary.05.01_1981.txt
Error using .
The VariableNames property must contain one name for each variable in the table.
Tbl.Properties.VariableNames = {'Day' 'MaximumT' 'MinimumT' 'AverageT'};
Cris LaPierre
2023-10-19
Still? It means that readtable is detecting a different number of variables in some of the data files. If you can identify which file is causing the error, you can modify the import function to account for the differences.
Jonathon Klepatzki
2023-10-19
编辑:Jonathon Klepatzki
2023-10-19
You don't think that a # or a * next too random values would have anything to do with it would you?
Or missing values (e.g. 999, -999, or NaN)?
I looked at the files that I need to run and there is some text above those four columns and below it. But I don't think that would cause an error.
Cris LaPierre
2023-10-19
The error is with the number of columns in the imported table. For at least 1 of your files the number of columns is not 4. Anything that impacts that is suspect.
% 9 columns if I import this way
Tbl1 = readtable("temp_summary.05.01_1981.txt")
Tbl1 = 31×9 table
Var1 Var2 Var3 Var4 Var5 Var6 Var7 Var8 Var9
____ ____ ____ ____ ____ ____ ____ ____ ____
1 NaN 65 NaN NaN 12 NaN NaN 38.5
2 NaN 68 NaN NaN 28 NaN NaN 48
3 NaN 65 NaN NaN 17 NaN NaN 41
4 NaN 57 NaN NaN 22 NaN NaN 39.5
5 NaN 46 NaN NaN 24 NaN NaN 35
6 NaN 61 NaN NaN 18 NaN NaN 39.5
7 NaN 62 NaN NaN 25 NaN NaN 43.5
8 NaN 58 NaN NaN 12 NaN NaN 35
9 NaN 64 NaN NaN 11 NaN NaN 37.5
10 NaN 65 NaN NaN 14 NaN NaN 39.5
11 NaN 54 NaN NaN 22 NaN NaN 38
12 NaN 58 NaN NaN 40 NaN NaN 49
13 NaN 64 NaN NaN 27 NaN NaN 45.5
14 NaN 65 NaN NaN 19 NaN NaN 42
15 NaN 59 NaN NaN 19 NaN NaN 39
16 NaN 62 NaN NaN 23 NaN NaN 42.5
% 4 columns if I immport this way
Tbl2 = readtable("temp_summary.05.01_1981.txt","ConsecutiveDelimitersRule","join")
Warning: Column headers from the file were modified to make them valid MATLAB identifiers before creating variable names for the table. The original column headers are saved in the VariableDescriptions property.
Set 'VariableNamingRule' to 'preserve' to use the original column headers as table variable names.
Set 'VariableNamingRule' to 'preserve' to use the original column headers as table variable names.
Tbl2 = 31×4 table
Day MaximumTemperature MinimumTemperature AverageTemperature
___ __________________ __________________ __________________
1 65 12 38.5
2 68 28 48
3 65 17 41
4 57 22 39.5
5 46 24 35
6 61 18 39.5
7 62 25 43.5
8 58 12 35
9 64 11 37.5
10 65 14 39.5
11 54 22 38
12 58 40 49
13 64 27 45.5
14 65 19 42
15 59 19 39
16 62 23 42.5
Jonathon Klepatzki
2023-10-19
编辑:Jonathon Klepatzki
2023-10-19
So, the issue wasn't the number of columns. It was the number of names for the variables. What I did is create bogus, bogus 1 and etc for Var 5 through 8.
However, there is another error.
"wrong number of arguments"
Error in untitled>readMonth (line 16)
Tbl.Day = datetime(d(3),d(1),Tbl.Day);
(before we continue, thank you again for the help and patience)
Cris LaPierre
2023-10-19
I guess that's a chicken or egg scenario. You can either modify the code so that it imports the desired number of columns, or adjust the code to work with the number of columns that are imported.
The code shared so far works for the data files for May. Please share one of your files that is still importing with more than 4 columns.
Jonathon Klepatzki
2023-10-19
Ah, that's the interesting part. On my end, when I used the readtable, every file had 8 columns. Var 5 - Var 8 didn't have any data. Var 1 (day), Var 2 (Maximum Temp), Var 3 (Minimum Temp), Var 4 (Average Temp) all contained their respected data.
The examples that I provided above were actually January files. However, I must navigate through each month with days varying (pending which month).
Can you explain to me Tbl.Day = datetime(d(3), d(1), Tbl.Day) line of code? I looked it up, it appears you used datevectors, is that correct?
Cris LaPierre
2023-10-19
编辑:Cris LaPierre
2023-10-19
Easiest is to point you to the documentation for datetime. I used this syntax
I use extract to pull the numeric imformation from the filename, then parse it to get the year and month inputs to datetime.
With the days as datetimes instead of numbers, I can use the grouping capabilities of groupsummary to find the max and min for each date, ignoring year: ["monthofyear","dayofmonth"]
Cris LaPierre
2023-10-19
Try this code to see if it is more robust.
Datafiles = fileDatastore("temp_summary*.txt","ReadFcn",@readMonth,"UniformRead",true);
dataAll = readall(Datafiles);
sumTbl = groupsummary(dataAll,["Day","Day"],["monthofyear","dayofmonth"],["max","min"],["MaxT" "MinT"]);
min_maxTempMay = [sumTbl.min_MinT sumTbl.max_MaxT]
min_maxTempMay = 31×2
10 67
24 70
16 67
21 61
20 58
16 63
21 63
10 60
11 64
12 65
function Tbl = readMonth(filename)
opts = detectImportOptions(filename);
opts.ConsecutiveDelimitersRule = 'join';
opts.MissingRule = 'omitvar';
opts = setvartype(opts,'double');
opts.VariableNames = ["Day","MaxT","MinT","AvgT"];
Tbl = readtable(filename,opts);
% extract year and month from filename
d = str2double(extract(filename,digitsPattern));
% convert day to datetime
Tbl.Day = datetime(d(3),d(1),Tbl.Day);
end
Jonathon Klepatzki
2023-10-19
Interesting. I'll see what I can do on my end and keep you posted. I'm assuming the script that you created doesn't run through all 12 months simultaneously?
Cris LaPierre
2023-10-19
编辑:Cris LaPierre
2023-10-20
It will run through all files that match the pattern defined in fileDatastore: "temp_summary*.txt"
Jonathon Klepatzki
2023-10-23
编辑:Jonathon Klepatzki
2023-10-23
Hi Cris,
I just ran the code and it works but only for May through 1981-1983. It left me with a 1197x8 matrix as it repeated itself. The bad news is, it also pulled the missing data (999) for the max temperatures and ran it.
What line of code do I need to adjust for all 12 months to be read? I will also need to adjust it for 1981-2021 as I have a larger data file to use. You've been awesome!
Cris LaPierre
2023-10-23
Are all files in the same folder? Is the naming convention consistent for all files?
Jonathon Klepatzki
2023-10-23
编辑:Jonathon Klepatzki
2023-10-23
Yup, same format. The only thing that changes are the month and year. For example,
(temp_summary.05.07_1981) the 05 is a station identifier, 7 is the month, and obviously 1981 is the year.
Cris LaPierre
2023-10-23
I had assumed the file name included month.day.year, so you should update the code in the readMonth function to use the correct values to build the date. I thought the filename contained the date in MM.dd.yy syntax. It is currently using the station identifier as the month, so everything is coming in as May.
Look into the TreatAsMissing Name-Value pair for readtable (see here). You can use this to treat specific values as missing values.
Jonathon Klepatzki
2023-10-23
I don't have much experience with building dates but, I'll give it a shot! Thank you and apologies for the mix up.
Cris LaPierre
2023-10-23
编辑:Cris LaPierre
2023-10-23
The code I shared already does all that. You just need to update it to use the correct value for month input to datetime. Please see Ch 5 of MATLAB Onramp if you are not sure how to do that.
Jonathon Klepatzki
2023-10-24
Thank you, thank you, thank you. After playing with the datetime code, I was able to figure out how to run through all of the months. You are a lifesaver!
Cris LaPierre
2023-10-25
Nicely done! Just to follow up, I saw a question about TreatAsMissing but see that it has been deleted. Just note that the input must be in quotes.
Tbl2 = readtable("temp_summary.05.01_1982.txt","ConsecutiveDelimitersRule","join",...
"TreatAsMissing","999")
Warning: Column headers from the file were modified to make them valid MATLAB identifiers before creating variable names for the table. The original column headers are saved in the VariableDescriptions property.
Set 'VariableNamingRule' to 'preserve' to use the original column headers as table variable names.
Set 'VariableNamingRule' to 'preserve' to use the original column headers as table variable names.
Tbl2 = 31×4 table
Day MaximumTemperature MinimumTemperature AverageTemperature
___ __________________ __________________ __________________
1 63 12 38.5
2 70 28 48
3 67 17 41
4 61 22 39.5
5 58 20 35
6 63 18 39.5
7 60 23 43.5
8 60 20 35
9 62 11 37.5
10 64 12 39.5
11 57 NaN NaN
12 60 23 49
13 66 30 45.5
14 63 19 42
15 61 16 39
16 64 21 42.5
Jonathon Klepatzki
2023-10-25
编辑:Jonathon Klepatzki
2023-10-25
I would then just adjust the code to run through all of the data? (" temp_summary.* ")
Cris LaPierre
2023-10-25
That code only works for a single file. The point was to show how to use the 'TreatAsMissing' name-value pair. You can read more about all Name-Value pairs here: https://www.mathworks.com/help/matlab/ref/readtable.html#mw_47bd1f02-32a6-44af-aedd-70395ba4c587
You will need to edit the code in the custom readMonth function if you want it to apply to all files read in.
Jonathon Klepatzki
2023-10-25
编辑:Jonathon Klepatzki
2023-10-25
So, plots look good except however it still takes into account of missing data. I tried adjusting the code to treat as missing (see below)
opts = setvaropts(opts,VariableNames,'TreatAsMissing','999')
However, I get the error code unrecognized function or variable 'VariableNames'
I have also tried adjusting setvartype but got a different error with that one as well.
What I would like to do is average out the previous two days and replace 999.
Any suggestions?
Cris LaPierre
2023-10-25
You need to define a variable before you use it. Have you created a variable named VariableNames? Look at some of the examples on the setvaropts documentation page: https://www.mathworks.com/help/matlab/ref/matlab.io.text.delimitedtextimportoptions.setvaropts.html#bvetrdw-4
As for replacing the missing values, you will still need to first turn the 999s into NaN. You can do that as I've shown when importing, or use standardizeMissing on the imported data.
Once the data is imported, you can use fillmissing to replace the NaNs with a value. Look into the available methods to see if one will work for you. If not, you may have to write your own code to do what you want..
Jonathon Klepatzki
2023-11-1
Hello Again,
Under the function Tbl that you wrote. I have tried multiple of ways by using the references that you provided to filter out the 999 but, sadly I am still not getting anywhere. The current example that I wrote is below:
Tbl = readtable(filename,opts)
Tbl1 = standardizeMissing(Tbl, {999,'N/A'},"DataVariables",{'MaxT,'MinT','AvgT'})
This doesn't bother the program at all however, still doesn't replace 999.
What do you think?
Voss
2023-11-1
@Jonathon Klepatzki: standardizeMissing does replace 999 with NaN. See below.
filename = 'temp_summary.05.01_1981.txt';
opts = detectImportOptions(filename);
opts.ConsecutiveDelimitersRule = 'join';
opts.MissingRule = 'omitvar';
opts = setvartype(opts,'double');
opts.VariableNames = ["Day","MaxT","MinT","AvgT"];
Tbl = readtable(filename,opts);
Tbl(27:31,:) % 999 in here
ans = 5×4 table
Day MaxT MinT AvgT
___ ____ ____ ____
27 54 23 38.5
28 52 29 40.5
29 51 999 NaN
30 49 31 40
31 47 24 35.5
Tbl1 = standardizeMissing(Tbl, {999,'N/A'},"DataVariables",{'MaxT','MinT','AvgT'});
Tbl1(27:31,:) % no longer 999; NaN instead
ans = 5×4 table
Day MaxT MinT AvgT
___ ____ ____ ____
27 54 23 38.5
28 52 29 40.5
29 51 NaN NaN
30 49 31 40
31 47 24 35.5
Cris LaPierre
2023-11-1
@Jonathon Klepatzki perhaps you can share all your code so we can see if you are overlooking something. Looking at just a couple lines of code may not be giving us enough information.
My suspicion right now is that your function is returning the variable Tbl, and in the code snippet you shared, you have assigned the output of standardizeMissing to a variable named Tbl1.
Jonathon Klepatzki
2023-11-1
I actually solved the issue! It was actually the Tbl that you mentioned. I'm still rusty on functions so I was thinking that I needed to create a new variable, hence why I went with Tbl1.
The last thing to make this code solid, is writing a code to retrieve the years the hottest and coldest temperatures occurred.
Cris LaPierre
2023-11-2
Look into using the second output of the min and max functions.
Be sure to read about the behavior of these functions on the doc page so you understand what the outputs are.
Jonathon Klepatzki
2023-11-6
编辑:Jonathon Klepatzki
2023-11-6
Question, to find the year when the hottest or coldest temperature occurred on a particular day (e.g. January 1st hottest temperature was 60 in 1983), do I need to create a separate bin for 'year'?
I've read the min and max functions that was posted on the 2nd and nothing changes when the code is inserted into the function.
Cris LaPierre
2023-11-6
编辑:Cris LaPierre
2023-11-6
Year information is dropped when the data is grouped, so you will not be able to find year using the code I've shared earlier. A new/different approach will be necessary to get the min/max per day and the year.
Jonathon Klepatzki
2023-11-7
编辑:Jonathon Klepatzki
2023-11-7
Hi Cris,
Since the code you wrote extracts the year from the filename, could we adjust the code to find the year of the hottest and coldest temperatures through that way?
Cris LaPierre
2023-11-7
There is always a way, but the code I shared uses groupsummary to find the min and max temperatures. I think it would be a lot of work to maintain that approach and find a way to include year.
You might want to look into findgroups and splitapply. It might require a little creativity, but I think it can be done. Finding the groups should be very similar, and using splitapply to find the min and max is straightforward. Getting the year will take some creativity, but if you use the syntax I mentioned above, you can get the index of the min or max. You can use that to calculate the year.
Jonathon Klepatzki
2023-11-8
I have an idea Cris, is it possible to extract the year from the filename then add that year to a separate column within the Tbl that's created? Then we can pair the data that way using group summary?
Cris LaPierre
2023-11-8
To appear in the groupsummary table, it needs to be either a grouping variable, or a coluimn the method operates on. There is no harm in trying to see what is possible. Try your thought out and let us know where you get stuck. Remember to share your code along with your question.
Jonathon Klepatzki
2023-11-9
编辑:Cris LaPierre
2023-11-9
Here's the code that I adjusted. This one works well but still doesn't pair up the year's to the max and min. However, when I tried it with "Year","Day","Day" within the groupsummary it gave me an invalid grouping error. Any suggestions?
Tbl1 = fileDatastore("temp_summary*.txt","ReadFcn",@readMonth,"UniformRead",true);
dataAll = readall(Tbl1);
sumbTbl = groupsummary(dataAll,["Day","Day","Day"],["dayofyear","monthofyear","dayofMonth"],["max","min","mean"],["MaxT" "MinT" "AvgT"]);
plot(sumbTbl,"monthofyear_Day","max_MaxT","Color",[1 0 0])
title("Meda 05 Maximum Temperature Climatology")
xlabel("MonthofYear")
ylabel("Maximum Temperature")
grid on
% results show the maximum and minimum max temperature of each month
plot(sumbTbl,"monthofyear_Day","min_MinT","Color",[0 0 1])
title("Meda 05 Minimum Temperature Climatology")
xlabel("MonthofYear")
ylabel("Minimum Temperature")
grid on
%
disp(sumbTbl)
dayofyear_Day monthofyear_Day dayofmonth_Day GroupCount max_MaxT min_MaxT mean_MaxT max_MinT min_MinT mean_MinT max_AvgT min_AvgT mean_AvgT
_____________ _______________ ______________ __________ ________ ________ _________ ________ ________ _________ ________ ________ _________
121 5 1 3 67 63 65 12 10 11.333 38.5 38.5 38.5
122 5 2 3 70 65 67.667 28 24 26.667 48 44 46.667
123 5 3 3 67 65 65.667 17 16 16.667 41 40.5 40.833
124 5 4 3 61 57 59 22 21 21.667 40 39.5 39.667
125 5 5 3 58 46 52.333 24 20 22 37 35 35.667
126 5 6 3 63 59 61 18 16 17.333 39.5 32.5 37.167
127 5 7 3 63 60 61.667 25 21 23 43.5 41 42.667
128 5 8 3 60 58 58.667 20 10 14 35 32 34
129 5 9 3 64 62 62.667 13 11 11.667 37.5 35.5 36.833
130 5 10 3 65 64 64.667 17 12 14.333 40 39.5 39.667
131 5 11 3 59 54 56.667 24 22 23 41 38 39.5
132 5 12 3 60 58 58.667 40 23 33.667 49 48 48.667
133 5 13 3 66 63 64.333 30 25 27.333 45.5 45.5 45.5
134 5 14 3 65 63 64.333 19 17 18.333 42 42 42
135 5 15 3 61 56 58.667 19 16 17.333 39 39 39
136 5 16 3 64 58 61.333 23 21 21.667 42.5 42.5 42.5
137 5 17 3 63 61 61.667 34 28 31.333 46.5 46.5 46.5
138 5 18 3 65 63 64 24 20 22 42.5 42.5 42.5
139 5 19 3 68 64 66.333 29 27 28 45.5 45.5 45.5
140 5 20 3 70 68 69.333 19 17 18.333 44.5 44.5 44.5
141 5 21 3 69 66 67.333 21 20 20.667 43.5 43.5 43.5
142 5 22 3 64 61 63 24 22 22.667 43 42.5 42.833
143 5 23 3 66 62 63.333 35 33 33.667 47.5 40.5 45.167
144 5 24 3 62 57 60.333 17 15 16.333 37 36.5 36.833
145 5 25 3 60 55 58 16 14 14.667 34.5 32.5 33.833
146 5 26 3 54 45 50 12 10 10.667 28 27.5 27.667
147 5 27 3 54 54 54 23 21 22.333 38.5 32.5 36.5
148 5 28 3 54 52 53 31 29 29.667 42 40.5 41
149 5 29 3 55 51 53.333 28 21 24.5 41 36 38.5
150 5 30 3 52 49 50.667 31 26 29.333 40 33.5 37.833
151 5 31 3 50 47 48.333 24 24 24 35.5 31 34
function Tbl = readMonth(filename)
opts = detectImportOptions(filename);
opts.ConsecutiveDelimitersRule = 'join';
opts.MissingRule = 'omitvar';
opts = setvartype(opts,'double');
opts.VariableNames = ["Day","MaxT","MinT","AvgT"];
Tbl = readtable(filename,opts);
Tbl = standardizeMissing(Tbl,{999,'N/A'},"DataVariables",{'MaxT','MinT','AvgT'});
[~,basename] = fileparts(filename);
nameparts = regexp(basename, '\.', 'split');
dateparts = regexp(nameparts{end}, '_','split');
year_str = dateparts{end};
d = str2double(extract(filename,digitsPattern));
Tbl.Day = datetime(d(3),d(2),Tbl.Day);
end
Cris LaPierre
2023-11-9
"Day" is the name of a table variable. "Year" is not. See the doc for groupsummary for more details.
You can extract the year from the datetime values in Day using the year function. You probably want to try this.
sumbTbl = groupsummary(dataAll,["Day","Day","Day"],["year","monthofyear","dayofMonth"],["max","min","mean"],["MaxT" "MinT" "AvgT"]);
I have not tried this so am not saying it will ultimately give you what you want. I'm just helping fix the error you are seeing when trying it.
Jonathon Klepatzki
2023-11-9
I made the correction and got the attached result. It is still not doing what I ultimately want. However, I will repost with something that I tried earlier.
Cris LaPierre
2023-11-9
The data must satisfy all grouping criteria to be included in the group. When year is a grouping variable, then only data from that year will be in the group. I can see how that will make it challenging to get the results you are looking for.
I'd again suggest looking into findgroups and splitapply.
Cris LaPierre
2023-11-9
编辑:Cris LaPierre
2023-11-9
I took a stab at it. I will admit that findgroups and splitapply have a bit more of a learning curve to them compared with groupsummary. Groupsummary as has an interactive live task.
The second output of max and min is the index of where the min/max value came from. Here, it is looking in a row with 3 columns. Column 1 in 1981, columns 2 is data from 1982, etc. So the index is 1, 2, 3 etc. But that can be used to figure out the year that data came from.
%% Unchanged
Datafiles = fileDatastore("temp_summary*.txt","ReadFcn",@readMonth,"UniformRead",true);
dataAll = readall(Datafiles)
dataAll = 93×5 table
Day MaxT MinT AvgT Station
___________ ____ ____ ____ _______
01-Jan-1981 65 12 38.5 5
02-Jan-1981 68 28 48 5
03-Jan-1981 65 17 41 5
04-Jan-1981 57 22 39.5 5
05-Jan-1981 46 24 35 5
06-Jan-1981 61 18 39.5 5
07-Jan-1981 62 25 43.5 5
08-Jan-1981 58 12 35 5
09-Jan-1981 64 11 37.5 5
10-Jan-1981 65 14 39.5 5
11-Jan-1981 54 22 38 5
12-Jan-1981 58 40 49 5
13-Jan-1981 64 27 45.5 5
14-Jan-1981 65 19 42 5
15-Jan-1981 59 19 39 5
16-Jan-1981 62 23 42.5 5
%% This part is new
% ensures the data is in the anticipated order, since 'unique' returns a sorted list.
dataAll = sortrows(dataAll);
% Extract the year
yr = unique(year(dataAll.Day))
yr = 3×1
1981
1982
1983
G = findgroups(month(dataAll.Day,'monthofyear'),day(dataAll.Day,'dayofmonth'));
Month = splitapply(@(x)unique(month(x,'monthofyear')),dataAll.Day,G);
Day = splitapply(@(x)unique(day(x,'dayofmonth')),dataAll.Day,G);
[xMin, idxMn] = splitapply(@min,dataAll.MinT,G)
xMin = 31×1
10
24
16
21
20
16
21
10
11
12
idxMn = 31×1
3
3
3
3
2
3
3
3
1
2
yrMn = yr(idxMn);
[xMax, idxMx] = splitapply(@max,dataAll.MaxT,G);
yrMx = yr(idxMx);
resultsTbl = table(Month,Day,xMin,yrMn,xMax,yrMx,'VariableNames',["Month","Day","MinT","MinT Year","MaxT","MaxT Year"])
resultsTbl = 31×6 table
Month Day MinT MinT Year MaxT MaxT Year
_____ ___ ____ _________ ____ _________
1 1 10 1983 67 1983
1 2 24 1983 70 1982
1 3 16 1983 67 1982
1 4 21 1983 61 1982
1 5 20 1982 58 1982
1 6 16 1983 63 1982
1 7 21 1983 63 1983
1 8 10 1983 60 1982
1 9 11 1981 64 1981
1 10 12 1982 65 1981
1 11 22 1981 59 1983
1 12 23 1982 60 1982
1 13 25 1983 66 1982
1 14 17 1983 65 1981
1 15 16 1982 61 1982
1 16 21 1982 64 1982
%% Unchanged
function Tbl = readMonth(filename)
Tbl = readtable(filename,"ConsecutiveDelimitersRule","join","ReadVariableNames",false,"TreatAsMissing",{'999','N/A'});
Tbl.Properties.VariableNames = {'Day' 'MaxT' 'MinT' 'AvgT'};
% extract year and month from filename
d = str2double(extract(filename,digitsPattern));
% convert day to datetime
Tbl.Day = datetime(d(3),d(2),Tbl.Day);
Tbl.Station(:) = d(1);
end
Let me add a caveat that I have not verified that these results are correct. Before relying on the code, I suggest making sure the min/max values are correct (I did check that they are the same as the previous solution), and that the corresponding year is correct (I have not checked at all).
Cris LaPierre
2023-11-10
Another option that is likely a little more robust is to use unstack. I found the quickest way to get the desired settings was to do so interatively using the Unstack Table Variable live task.
% View starting table
load temp_summary_data.mat
dataAll.Year = year(dataAll.Day);
dataAll.Month = month(dataAll.Day);
dataAll.DD = day(dataAll.Day)
dataAll = 93×8 table
Day MaxT MinT AvgT Station Year Month DD
___________ ____ ____ ____ _______ ____ _____ __
01-Jan-1981 65 12 38.5 5 1981 1 1
02-Jan-1981 68 28 48 5 1981 1 2
03-Jan-1981 65 17 41 5 1981 1 3
04-Jan-1981 57 22 39.5 5 1981 1 4
05-Jan-1981 46 24 35 5 1981 1 5
06-Jan-1981 61 18 39.5 5 1981 1 6
07-Jan-1981 62 25 43.5 5 1981 1 7
08-Jan-1981 58 12 35 5 1981 1 8
09-Jan-1981 64 11 37.5 5 1981 1 9
10-Jan-1981 65 14 39.5 5 1981 1 10
11-Jan-1981 54 22 38 5 1981 1 11
12-Jan-1981 58 40 49 5 1981 1 12
13-Jan-1981 64 27 45.5 5 1981 1 13
14-Jan-1981 65 19 42 5 1981 1 14
15-Jan-1981 59 19 39 5 1981 1 15
16-Jan-1981 62 23 42.5 5 1981 1 16
% Unstack variables
minT_tbl = unstack(dataAll,"MinT","Year","GroupingVariables",["Month","DD"],"VariableNamingRule","preserve")
minT_tbl = 31×5 table
Month DD 1981 1982 1983
_____ __ ____ ____ ____
1 1 12 12 10
1 2 28 28 24
1 3 17 17 16
1 4 22 22 21
1 5 24 20 22
1 6 18 18 16
1 7 25 23 21
1 8 12 20 10
1 9 11 11 13
1 10 14 12 17
1 11 22 NaN 24
1 12 40 23 38
1 13 27 30 25
1 14 19 19 17
1 15 19 16 17
1 16 23 21 21
maxT_tbl = unstack(dataAll,"MaxT","Year","GroupingVariables",["Month","DD"],"VariableNamingRule","preserve")
maxT_tbl = 31×5 table
Month DD 1981 1982 1983
_____ __ ____ ____ ____
1 1 65 63 67
1 2 68 70 65
1 3 65 67 65
1 4 57 61 59
1 5 46 58 53
1 6 61 63 59
1 7 62 60 63
1 8 58 60 58
1 9 64 62 62
1 10 65 64 65
1 11 54 57 59
1 12 58 60 58
1 13 64 66 63
1 14 65 63 65
1 15 59 61 56
1 16 62 64 58
yrs =str2double(minT_tbl.Properties.VariableNames(3:end))';
% find min
[minT,idxMn] = min(minT_tbl{:,3:end},[],2);
minT_yr = yrs(idxMn);
% find max
[maxT,idxMx] = max(maxT_tbl{:,3:end},[],2);
maxT_yr = yrs(idxMx);
% Results
tempTbl = [maxT_tbl(:,["Month","DD"]), table(minT,minT_yr,maxT,maxT_yr)]
tempTbl = 31×6 table
Month DD minT minT_yr maxT maxT_yr
_____ __ ____ _______ ____ _______
1 1 10 1983 67 1983
1 2 24 1983 70 1982
1 3 16 1983 67 1982
1 4 21 1983 61 1982
1 5 20 1982 58 1982
1 6 16 1983 63 1982
1 7 21 1983 63 1983
1 8 10 1983 60 1982
1 9 11 1981 64 1981
1 10 12 1982 65 1981
1 11 22 1981 59 1983
1 12 23 1982 60 1982
1 13 25 1983 66 1982
1 14 17 1983 65 1981
1 15 16 1982 61 1982
1 16 21 1982 64 1982
Jonathon Klepatzki
2023-11-12
编辑:Jonathon Klepatzki
2023-11-12
This is incredibly odd, I used your code and got errors for using them.
I've attached the first.
Was I suppose to create a mat file to run the second script that you sent?
Cris LaPierre
2023-11-12
编辑:Cris LaPierre
2023-11-13
Not that odd. We've discussed this error previously in this thread (see here: https://www.mathworks.com/matlabcentral/answers/2034409-comparing-each-row-of-a-table-to-the-same-row-of-a-different-table-then-store-the-highest-maximum-t#comment_2928966)
Having only tested the code on a single month, it is possible that there is slightly different formatting in some of the other data files. Hard to say without seeing the files.
Jonathon Klepatzki
2023-11-13
I just wanted to thank you for the help you've given me. I rewrote the function and applied it to your example that you posted on November 9th. It works, I verified the results with the files that I have. Since I already had the other script to pull in the averages, I'll just keep both instead of rewriting the new one with average max t and etc.
Again, thank you for your patience and help during this!
Jonathon Klepatzki
2023-11-22
编辑:Cris LaPierre
2023-11-22
Hey bud, sorry to bother you once again. So, I decided to use your unstack code (everything must be on a single chart) and need to find one more thing. As you can see below, the code attempts to find the avgMax, avgMin, and dailyAvg.
Unfortunately, I am getting this error "Error using mean, too many output arguments". Any ideas?
close all;
clear all;
clc;
Datafiles = fileDatastore("temp_summary*.txt","ReadFcn",@readMonth,"UniformRead",true);
opts =
DelimitedTextImportOptions with properties:
Format Properties:
Delimiter: {'\t'}
Whitespace: '\b '
LineEnding: {'\n' '\r' '\r\n'}
CommentStyle: {}
ConsecutiveDelimitersRule: 'split'
LeadingDelimitersRule: 'keep'
TrailingDelimitersRule: 'ignore'
EmptyLineRule: 'skip'
Encoding: 'UTF-8'
Replacement Properties:
MissingRule: 'fill'
ImportErrorRule: 'fill'
ExtraColumnsRule: 'addvars'
Variable Import Properties: Set types by name using setvartype
VariableNames: {'Var1', 'Var2', 'Var3' ... and 6 more}
VariableTypes: {'double', 'char', 'double' ... and 6 more}
SelectedVariableNames: {'Var1', 'Var2', 'Var3' ... and 6 more}
VariableOptions: [1-by-9 matlab.io.VariableImportOptions]
Access VariableOptions sub-properties using setvaropts/getvaropts
VariableNamingRule: 'modify'
Location Properties:
DataLines: [2 Inf]
VariableNamesLine: 0
RowNamesColumn: 0
VariableUnitsLine: 0
VariableDescriptionsLine: 0
To display a preview of the table, use preview
Tbl = 31×4 table
Day MaxT MinT AvgT
___ ____ ____ ____
1 65 12 38.5
2 68 28 48
3 65 17 41
4 57 22 39.5
5 46 24 35
6 61 18 39.5
7 62 25 43.5
8 58 12 35
9 64 11 37.5
10 65 14 39.5
11 54 22 38
12 58 40 49
13 64 27 45.5
14 65 19 42
15 59 19 39
16 62 23 42.5
Tbl = 31×4 table
Day MaxT MinT AvgT
___ ____ ____ ____
1 65 12 38.5
2 68 28 48
3 65 17 41
4 57 22 39.5
5 46 24 35
6 61 18 39.5
7 62 25 43.5
8 58 12 35
9 64 11 37.5
10 65 14 39.5
11 54 22 38
12 58 40 49
13 64 27 45.5
14 65 19 42
15 59 19 39
16 62 23 42.5
year_str = '1981'
Tbl = 31×4 table
Day MaxT MinT AvgT
___________ ____ ____ ____
01-May-0001 65 12 38.5
02-May-0001 68 28 48
03-May-0001 65 17 41
04-May-0001 57 22 39.5
05-May-0001 46 24 35
06-May-0001 61 18 39.5
07-May-0001 62 25 43.5
08-May-0001 58 12 35
09-May-0001 64 11 37.5
10-May-0001 65 14 39.5
11-May-0001 54 22 38
12-May-0001 58 40 49
13-May-0001 64 27 45.5
14-May-0001 65 19 42
15-May-0001 59 19 39
16-May-0001 62 23 42.5
dataAll = readall(Datafiles)
opts =
DelimitedTextImportOptions with properties:
Format Properties:
Delimiter: {'\t'}
Whitespace: '\b '
LineEnding: {'\n' '\r' '\r\n'}
CommentStyle: {}
ConsecutiveDelimitersRule: 'split'
LeadingDelimitersRule: 'keep'
TrailingDelimitersRule: 'ignore'
EmptyLineRule: 'skip'
Encoding: 'UTF-8'
Replacement Properties:
MissingRule: 'fill'
ImportErrorRule: 'fill'
ExtraColumnsRule: 'addvars'
Variable Import Properties: Set types by name using setvartype
VariableNames: {'Var1', 'Var2', 'Var3' ... and 6 more}
VariableTypes: {'double', 'char', 'double' ... and 6 more}
SelectedVariableNames: {'Var1', 'Var2', 'Var3' ... and 6 more}
VariableOptions: [1-by-9 matlab.io.VariableImportOptions]
Access VariableOptions sub-properties using setvaropts/getvaropts
VariableNamingRule: 'modify'
Location Properties:
DataLines: [2 Inf]
VariableNamesLine: 0
RowNamesColumn: 0
VariableUnitsLine: 0
VariableDescriptionsLine: 0
To display a preview of the table, use preview
Tbl = 31×4 table
Day MaxT MinT AvgT
___ ____ ____ ____
1 65 12 38.5
2 68 28 48
3 65 17 41
4 57 22 39.5
5 46 24 35
6 61 18 39.5
7 62 25 43.5
8 58 12 35
9 64 11 37.5
10 65 14 39.5
11 54 22 38
12 58 40 49
13 64 27 45.5
14 65 19 42
15 59 19 39
16 62 23 42.5
Tbl = 31×4 table
Day MaxT MinT AvgT
___ ____ ____ ____
1 65 12 38.5
2 68 28 48
3 65 17 41
4 57 22 39.5
5 46 24 35
6 61 18 39.5
7 62 25 43.5
8 58 12 35
9 64 11 37.5
10 65 14 39.5
11 54 22 38
12 58 40 49
13 64 27 45.5
14 65 19 42
15 59 19 39
16 62 23 42.5
year_str = '1981'
Tbl = 31×4 table
Day MaxT MinT AvgT
___________ ____ ____ ____
01-May-0001 65 12 38.5
02-May-0001 68 28 48
03-May-0001 65 17 41
04-May-0001 57 22 39.5
05-May-0001 46 24 35
06-May-0001 61 18 39.5
07-May-0001 62 25 43.5
08-May-0001 58 12 35
09-May-0001 64 11 37.5
10-May-0001 65 14 39.5
11-May-0001 54 22 38
12-May-0001 58 40 49
13-May-0001 64 27 45.5
14-May-0001 65 19 42
15-May-0001 59 19 39
16-May-0001 62 23 42.5
opts =
DelimitedTextImportOptions with properties:
Format Properties:
Delimiter: {'\t'}
Whitespace: '\b '
LineEnding: {'\n' '\r' '\r\n'}
CommentStyle: {}
ConsecutiveDelimitersRule: 'split'
LeadingDelimitersRule: 'keep'
TrailingDelimitersRule: 'ignore'
EmptyLineRule: 'skip'
Encoding: 'UTF-8'
Replacement Properties:
MissingRule: 'fill'
ImportErrorRule: 'fill'
ExtraColumnsRule: 'addvars'
Variable Import Properties: Set types by name using setvartype
VariableNames: {'Var1', 'Var2', 'Var3' ... and 6 more}
VariableTypes: {'double', 'char', 'double' ... and 6 more}
SelectedVariableNames: {'Var1', 'Var2', 'Var3' ... and 6 more}
VariableOptions: [1-by-9 matlab.io.VariableImportOptions]
Access VariableOptions sub-properties using setvaropts/getvaropts
VariableNamingRule: 'modify'
Location Properties:
DataLines: [2 Inf]
VariableNamesLine: 0
RowNamesColumn: 0
VariableUnitsLine: 0
VariableDescriptionsLine: 0
To display a preview of the table, use preview
Tbl = 31×4 table
Day MaxT MinT AvgT
___ ____ ____ ____
1 63 12 38.5
2 70 28 48
3 67 17 41
4 61 22 39.5
5 58 20 35
6 63 18 39.5
7 60 23 43.5
8 60 20 35
9 62 11 37.5
10 64 12 39.5
11 57 999 NaN
12 60 23 49
13 66 30 45.5
14 63 19 42
15 61 16 39
16 64 21 42.5
Tbl = 31×4 table
Day MaxT MinT AvgT
___ ____ ____ ____
1 63 12 38.5
2 70 28 48
3 67 17 41
4 61 22 39.5
5 58 20 35
6 63 18 39.5
7 60 23 43.5
8 60 20 35
9 62 11 37.5
10 64 12 39.5
11 57 NaN NaN
12 60 23 49
13 66 30 45.5
14 63 19 42
15 61 16 39
16 64 21 42.5
year_str = '1982'
Tbl = 31×4 table
Day MaxT MinT AvgT
___________ ____ ____ ____
01-May-0001 63 12 38.5
02-May-0001 70 28 48
03-May-0001 67 17 41
04-May-0001 61 22 39.5
05-May-0001 58 20 35
06-May-0001 63 18 39.5
07-May-0001 60 23 43.5
08-May-0001 60 20 35
09-May-0001 62 11 37.5
10-May-0001 64 12 39.5
11-May-0001 57 NaN NaN
12-May-0001 60 23 49
13-May-0001 66 30 45.5
14-May-0001 63 19 42
15-May-0001 61 16 39
16-May-0001 64 21 42.5
opts =
DelimitedTextImportOptions with properties:
Format Properties:
Delimiter: {'\t'}
Whitespace: '\b '
LineEnding: {'\n' '\r' '\r\n'}
CommentStyle: {}
ConsecutiveDelimitersRule: 'split'
LeadingDelimitersRule: 'keep'
TrailingDelimitersRule: 'ignore'
EmptyLineRule: 'skip'
Encoding: 'UTF-8'
Replacement Properties:
MissingRule: 'fill'
ImportErrorRule: 'fill'
ExtraColumnsRule: 'addvars'
Variable Import Properties: Set types by name using setvartype
VariableNames: {'Var1', 'Var2', 'Var3' ... and 6 more}
VariableTypes: {'double', 'char', 'double' ... and 6 more}
SelectedVariableNames: {'Var1', 'Var2', 'Var3' ... and 6 more}
VariableOptions: [1-by-9 matlab.io.VariableImportOptions]
Access VariableOptions sub-properties using setvaropts/getvaropts
VariableNamingRule: 'modify'
Location Properties:
DataLines: [2 Inf]
VariableNamesLine: 0
RowNamesColumn: 0
VariableUnitsLine: 0
VariableDescriptionsLine: 0
To display a preview of the table, use preview
Tbl = 31×4 table
Day MaxT MinT AvgT
___ ____ ____ ____
1 67 10 38.5
2 65 24 44
3 65 16 40.5
4 59 21 40
5 53 22 37
6 59 16 32.5
7 63 21 41
8 58 10 32
9 62 13 35.5
10 65 17 40
11 59 24 41
12 58 38 48
13 63 25 45.5
14 65 17 42
15 56 17 39
16 58 21 42.5
Tbl = 31×4 table
Day MaxT MinT AvgT
___ ____ ____ ____
1 67 10 38.5
2 65 24 44
3 65 16 40.5
4 59 21 40
5 53 22 37
6 59 16 32.5
7 63 21 41
8 58 10 32
9 62 13 35.5
10 65 17 40
11 59 24 41
12 58 38 48
13 63 25 45.5
14 65 17 42
15 56 17 39
16 58 21 42.5
year_str = '1983'
Tbl = 31×4 table
Day MaxT MinT AvgT
___________ ____ ____ ____
01-May-0001 67 10 38.5
02-May-0001 65 24 44
03-May-0001 65 16 40.5
04-May-0001 59 21 40
05-May-0001 53 22 37
06-May-0001 59 16 32.5
07-May-0001 63 21 41
08-May-0001 58 10 32
09-May-0001 62 13 35.5
10-May-0001 65 17 40
11-May-0001 59 24 41
12-May-0001 58 38 48
13-May-0001 63 25 45.5
14-May-0001 65 17 42
15-May-0001 56 17 39
16-May-0001 58 21 42.5
dataAll = 93×4 table
Day MaxT MinT AvgT
___________ ____ ____ ____
01-May-0001 65 12 38.5
02-May-0001 68 28 48
03-May-0001 65 17 41
04-May-0001 57 22 39.5
05-May-0001 46 24 35
06-May-0001 61 18 39.5
07-May-0001 62 25 43.5
08-May-0001 58 12 35
09-May-0001 64 11 37.5
10-May-0001 65 14 39.5
11-May-0001 54 22 38
12-May-0001 58 40 49
13-May-0001 64 27 45.5
14-May-0001 65 19 42
15-May-0001 59 19 39
16-May-0001 62 23 42.5
dataAll.Year = year(dataAll.Day);
dataAll.Month = month(dataAll.Day);
dataAll.DD = day(dataAll.Day)
dataAll = 93×7 table
Day MaxT MinT AvgT Year Month DD
___________ ____ ____ ____ ____ _____ __
01-May-0001 65 12 38.5 1 5 1
02-May-0001 68 28 48 1 5 2
03-May-0001 65 17 41 1 5 3
04-May-0001 57 22 39.5 1 5 4
05-May-0001 46 24 35 1 5 5
06-May-0001 61 18 39.5 1 5 6
07-May-0001 62 25 43.5 1 5 7
08-May-0001 58 12 35 1 5 8
09-May-0001 64 11 37.5 1 5 9
10-May-0001 65 14 39.5 1 5 10
11-May-0001 54 22 38 1 5 11
12-May-0001 58 40 49 1 5 12
13-May-0001 64 27 45.5 1 5 13
14-May-0001 65 19 42 1 5 14
15-May-0001 59 19 39 1 5 15
16-May-0001 62 23 42.5 1 5 16
% Unstack variables
minT_tbl = unstack(dataAll,"MinT","Year","GroupingVariables",["Month","DD"],"VariableNamingRule","preserve")
minT_tbl = 31×3 table
Month DD 1
_____ __ ___
5 1 34
5 2 80
5 3 50
5 4 65
5 5 66
5 6 52
5 7 69
5 8 42
5 9 35
5 10 43
5 11 NaN
5 12 101
5 13 82
5 14 55
5 15 52
5 16 65
maxT_tbl = unstack(dataAll,"MaxT","Year","GroupingVariables",["Month","DD"],"VariableNamingRule","preserve")
maxT_tbl = 31×3 table
Month DD 1
_____ __ ___
5 1 195
5 2 203
5 3 197
5 4 177
5 5 157
5 6 183
5 7 185
5 8 176
5 9 188
5 10 194
5 11 170
5 12 176
5 13 193
5 14 193
5 15 176
5 16 184
AvgT_tbl = unstack(dataAll,"AvgT","Year","GroupingVariables",["Month","DD"],"VariableNamingRule","preserve")
AvgT_tbl = 31×3 table
Month DD 1
_____ __ _____
5 1 115.5
5 2 140
5 3 122.5
5 4 119
5 5 107
5 6 111.5
5 7 128
5 8 102
5 9 110.5
5 10 119
5 11 NaN
5 12 146
5 13 136.5
5 14 126
5 15 117
5 16 127.5
yrs =str2double(minT_tbl.Properties.VariableNames(3:end))';
% find min
[minT,idxMn] = min(minT_tbl{:,3:end},[],2);
minT_yr = yrs(idxMn);
% find max
[maxT,idxMx] = max(maxT_tbl{:,3:end},[],2);
maxT_yr = yrs(idxMx);
% find low high
[lowMaxT,idxMx] = min(maxT_tbl{:,3:end},[],2);
LowHighT_yr = yrs(idxMx);
% find high low
[highlowMnT,idxMn] = max(minT_tbl{:,3:end},[],2);
HighLowT_yr = yrs(idxMn);
% find avg high
[AvgMxT,idxMx] = mean(maxT_tbl{:,3:end},[],2);
Error using mean
Too many output arguments.
Too many output arguments.
% find avg low
[AvgMnT,idxMn] = mean(minT_tbl{:,3:end},[],2);
% find avg
[AvgT,idxmean] = mean(maxT_tbl,minT_tbl{:,3:end},[],2);
% Results
tempTbl = [maxT_tbl(:,["Month","DD"]), table(minT,minT_yr,maxT,maxT_yr,lowMaxT,LowHighT_yr,highlowMnT,HighLowT_yr,AvgMxT,AvgMnT,AvgT)]
disp(tempTbl)
function Tbl = readMonth(filename)
opts = detectImportOptions(filename)
opts.ConsecutiveDelimitersRule = 'join';
opts.MissingRule = 'omitvar';
opts = setvartype(opts,'double');
opts.VariableNames = ["Day","MaxT","MinT","AvgT"];
Tbl = readtable(filename,opts)
Tbl = standardizeMissing(Tbl,{999,'N/A'},"DataVariables",{'MaxT','MinT','AvgT'})
[~,basename] = fileparts(filename);
nameparts = regexp(basename, '\.', 'split');
dateparts = regexp(nameparts{end}, '_','split');
year_str = dateparts{end}
d = str2double(extract(filename,digitsPattern));
Tbl.Day = datetime(d(3),d(2),Tbl.Day)
end
Cris LaPierre
2023-11-22
The mean function does not support a second ouput. Be sure to use a syntax that is supported by the function. This looks like the syntax you should be using: M = mean(A,dim)
Jonathon Klepatzki
2023-11-22
It works now, thank you. I tried two different ways and forgot to set dim. My last question is this, if the avg temp is already calculated then how do we find it?
I just tried AvgT = [AvgT_tbl{:,3:end}] and got this result...all I need is the column data for AvgT. Any suggestions to narrow this down?
Cris LaPierre
2023-11-22
How do you want to combine the average temperature for that day across multiple years? You have to make that decision first.
Jonathon Klepatzki
2023-11-24
No combination is necessary. Just need to list that column's data. Only thing that warranted combination was the high and low temperature.
Cris LaPierre
2023-11-24
Ok, but in the example you have shared, you have 3 AvgT for each day. You want to display all 3 like in your screenshot? Asked a different way, what is wrong with the result you showed here? What should it be showing instead?
Jonathon Klepatzki
2024-1-4
编辑:Jonathon Klepatzki
2024-1-4
Hi Cris,
Weird question to ask. The code that was finalized is now having issues on MATLAB 2020B but works on MATLAB 2023A.
Same code, however it is now giving me an error using sum on line 28 "invalid data type. first argument must be numeric or logical".
% find avg high
AvgMxT = mean(maxT_tbl,2);
% find avg low
AvgMnT = mean(minT_tbl,2);
Are there any differences between the two licenses?
Cris LaPierre
2024-1-4
编辑:Cris LaPierre
2024-1-4
No, there are differences in the mean function. In R2020b, mean did not accept a table as input. That was added in R2023a (see here)
To be backward compatible, use varfun to compute the mean of a table (See this post: https://www.mathworks.com/matlabcentral/answers/797512-how-to-get-mean-of-all-variables-in-a-table) , or to extract the data to or convert the table to an array (see here or here ).
Jonathon Klepatzki
2024-1-4
编辑:Jonathon Klepatzki
2024-1-4
So like this?
% find avg high
AvgMxT.stringvar = ['maxT'];
varfun(@mean,AvgMxT,'InputVariables',@isnumeric)
% find avg low
AvgMnT.stringvar = ['minT'];
varfun(@mean,AvgMnT,'InputVariables',@isnumeric)
It does give me an error "check for missing argument or incorrect argument data type in call to function 'varfun' "
The second way that I tried it is below:
AvgMxT.stringvar = ['maxT'];
varfun(@mean,maxT_tbl,'InputVariables',@isnumeric)
% find avg low
AvgMnT.stringvar = ['minT'];
varfun(@mean,minT_tbl,'InputVariables',@isnumeric)
however, this way gives me an error on line 36 "all table variables must have the smae number of rows"
tempTbl = [maxT_tbl(:,["Month","DD"]), table(maxT,maxT_yr,AvgMxT,lowMaxT,LowMaxT_yr,minT,minT_yr,AvgMnT,highlowMnT,HighLowT_yr)]
Cris LaPierre
2024-1-4
You haven't shared enough for us to debug your specific scenario. Perhaps the most straightforward solution is to use table2array. Just beware that the output will be an array, not a table. This may impact downstream code. You could convert the result back to a table if necessary.
AvgMxT = mean(table2array(maxT_tbl),2);
If you want to pursue varfun, you'll need to use an anonymous function so that you can preserve taking the mean along the 2nd dimension. Here's an example using a built-in dataset
tbl = readtable('patients.xls');
newT = tbl(:,["Diastolic","Systolic"])
newT = 100×2 table
Diastolic Systolic
_________ ________
93 124
77 109
83 125
75 117
80 122
70 121
88 130
82 115
78 115
86 118
77 114
68 115
74 127
95 130
79 114
92 130
% using mean
meanT = mean(newT,2)
meanT = 100×1 table
mean
_____
108.5
93
104
96
101
95.5
109
98.5
96.5
102
95.5
91.5
100.5
112.5
96.5
111
% using varfun
meanT2 = varfun(@(x) mean(x,2),meanT,'InputVariables',@isnumeric)
meanT2 = 100×1 table
Fun_mean
________
108.5
93
104
96
101
95.5
109
98.5
96.5
102
95.5
91.5
100.5
112.5
96.5
111
% using table2array
meanT = mean(table2array(newT),2)
meanT = 100×1
108.5000
93.0000
104.0000
96.0000
101.0000
95.5000
109.0000
98.5000
96.5000
102.0000
Jonathon Klepatzki
2024-1-4
编辑:Cris LaPierre
2024-1-4
Hi Cris,
Thank you for the tip, I'll try it on my end. In the meant time, below is the rest of the finalized code.
close all;
clear all;
clc;
Datafiles = fileDatastore("temp_summary*.txt","ReadFcn",@readMonth,"UniformRead",true);
opts =
DelimitedTextImportOptions with properties:
Format Properties:
Delimiter: {'\t'}
Whitespace: '\b '
LineEnding: {'\n' '\r' '\r\n'}
CommentStyle: {}
ConsecutiveDelimitersRule: 'split'
LeadingDelimitersRule: 'keep'
TrailingDelimitersRule: 'ignore'
EmptyLineRule: 'skip'
Encoding: 'UTF-8'
Replacement Properties:
MissingRule: 'fill'
ImportErrorRule: 'fill'
ExtraColumnsRule: 'addvars'
Variable Import Properties: Set types by name using setvartype
VariableNames: {'Var1', 'Var2', 'Var3' ... and 6 more}
VariableTypes: {'double', 'char', 'double' ... and 6 more}
SelectedVariableNames: {'Var1', 'Var2', 'Var3' ... and 6 more}
VariableOptions: [1-by-9 matlab.io.VariableImportOptions]
Access VariableOptions sub-properties using setvaropts/getvaropts
VariableNamingRule: 'modify'
Location Properties:
DataLines: [2 Inf]
VariableNamesLine: 0
RowNamesColumn: 0
VariableUnitsLine: 0
VariableDescriptionsLine: 0
To display a preview of the table, use preview
Tbl = 31×4 table
Day MaxT MinT AvgT
___ ____ ____ ____
1 65 12 38.5
2 68 28 48
3 65 17 41
4 57 22 39.5
5 46 24 35
6 61 18 39.5
7 62 25 43.5
8 58 12 35
9 64 11 37.5
10 65 14 39.5
11 54 22 38
12 58 40 49
13 64 27 45.5
14 65 19 42
15 59 19 39
16 62 23 42.5
Tbl = 31×4 table
Day MaxT MinT AvgT
___ ____ ____ ____
1 65 12 38.5
2 68 28 48
3 65 17 41
4 57 22 39.5
5 46 24 35
6 61 18 39.5
7 62 25 43.5
8 58 12 35
9 64 11 37.5
10 65 14 39.5
11 54 22 38
12 58 40 49
13 64 27 45.5
14 65 19 42
15 59 19 39
16 62 23 42.5
year_str = '1981'
Tbl = 31×4 table
Day MaxT MinT AvgT
___________ ____ ____ ____
01-May-0001 65 12 38.5
02-May-0001 68 28 48
03-May-0001 65 17 41
04-May-0001 57 22 39.5
05-May-0001 46 24 35
06-May-0001 61 18 39.5
07-May-0001 62 25 43.5
08-May-0001 58 12 35
09-May-0001 64 11 37.5
10-May-0001 65 14 39.5
11-May-0001 54 22 38
12-May-0001 58 40 49
13-May-0001 64 27 45.5
14-May-0001 65 19 42
15-May-0001 59 19 39
16-May-0001 62 23 42.5
dataAll = readall(Datafiles)
opts =
DelimitedTextImportOptions with properties:
Format Properties:
Delimiter: {'\t'}
Whitespace: '\b '
LineEnding: {'\n' '\r' '\r\n'}
CommentStyle: {}
ConsecutiveDelimitersRule: 'split'
LeadingDelimitersRule: 'keep'
TrailingDelimitersRule: 'ignore'
EmptyLineRule: 'skip'
Encoding: 'UTF-8'
Replacement Properties:
MissingRule: 'fill'
ImportErrorRule: 'fill'
ExtraColumnsRule: 'addvars'
Variable Import Properties: Set types by name using setvartype
VariableNames: {'Var1', 'Var2', 'Var3' ... and 6 more}
VariableTypes: {'double', 'char', 'double' ... and 6 more}
SelectedVariableNames: {'Var1', 'Var2', 'Var3' ... and 6 more}
VariableOptions: [1-by-9 matlab.io.VariableImportOptions]
Access VariableOptions sub-properties using setvaropts/getvaropts
VariableNamingRule: 'modify'
Location Properties:
DataLines: [2 Inf]
VariableNamesLine: 0
RowNamesColumn: 0
VariableUnitsLine: 0
VariableDescriptionsLine: 0
To display a preview of the table, use preview
Tbl = 31×4 table
Day MaxT MinT AvgT
___ ____ ____ ____
1 65 12 38.5
2 68 28 48
3 65 17 41
4 57 22 39.5
5 46 24 35
6 61 18 39.5
7 62 25 43.5
8 58 12 35
9 64 11 37.5
10 65 14 39.5
11 54 22 38
12 58 40 49
13 64 27 45.5
14 65 19 42
15 59 19 39
16 62 23 42.5
Tbl = 31×4 table
Day MaxT MinT AvgT
___ ____ ____ ____
1 65 12 38.5
2 68 28 48
3 65 17 41
4 57 22 39.5
5 46 24 35
6 61 18 39.5
7 62 25 43.5
8 58 12 35
9 64 11 37.5
10 65 14 39.5
11 54 22 38
12 58 40 49
13 64 27 45.5
14 65 19 42
15 59 19 39
16 62 23 42.5
year_str = '1981'
Tbl = 31×4 table
Day MaxT MinT AvgT
___________ ____ ____ ____
01-May-0001 65 12 38.5
02-May-0001 68 28 48
03-May-0001 65 17 41
04-May-0001 57 22 39.5
05-May-0001 46 24 35
06-May-0001 61 18 39.5
07-May-0001 62 25 43.5
08-May-0001 58 12 35
09-May-0001 64 11 37.5
10-May-0001 65 14 39.5
11-May-0001 54 22 38
12-May-0001 58 40 49
13-May-0001 64 27 45.5
14-May-0001 65 19 42
15-May-0001 59 19 39
16-May-0001 62 23 42.5
opts =
DelimitedTextImportOptions with properties:
Format Properties:
Delimiter: {'\t'}
Whitespace: '\b '
LineEnding: {'\n' '\r' '\r\n'}
CommentStyle: {}
ConsecutiveDelimitersRule: 'split'
LeadingDelimitersRule: 'keep'
TrailingDelimitersRule: 'ignore'
EmptyLineRule: 'skip'
Encoding: 'UTF-8'
Replacement Properties:
MissingRule: 'fill'
ImportErrorRule: 'fill'
ExtraColumnsRule: 'addvars'
Variable Import Properties: Set types by name using setvartype
VariableNames: {'Var1', 'Var2', 'Var3' ... and 6 more}
VariableTypes: {'double', 'char', 'double' ... and 6 more}
SelectedVariableNames: {'Var1', 'Var2', 'Var3' ... and 6 more}
VariableOptions: [1-by-9 matlab.io.VariableImportOptions]
Access VariableOptions sub-properties using setvaropts/getvaropts
VariableNamingRule: 'modify'
Location Properties:
DataLines: [2 Inf]
VariableNamesLine: 0
RowNamesColumn: 0
VariableUnitsLine: 0
VariableDescriptionsLine: 0
To display a preview of the table, use preview
Tbl = 31×4 table
Day MaxT MinT AvgT
___ ____ ____ ____
1 63 12 38.5
2 70 28 48
3 67 17 41
4 61 22 39.5
5 58 20 35
6 63 18 39.5
7 60 23 43.5
8 60 20 35
9 62 11 37.5
10 64 12 39.5
11 57 999 NaN
12 60 23 49
13 66 30 45.5
14 63 19 42
15 61 16 39
16 64 21 42.5
Tbl = 31×4 table
Day MaxT MinT AvgT
___ ____ ____ ____
1 63 12 38.5
2 70 28 48
3 67 17 41
4 61 22 39.5
5 58 20 35
6 63 18 39.5
7 60 23 43.5
8 60 20 35
9 62 11 37.5
10 64 12 39.5
11 57 NaN NaN
12 60 23 49
13 66 30 45.5
14 63 19 42
15 61 16 39
16 64 21 42.5
year_str = '1982'
Tbl = 31×4 table
Day MaxT MinT AvgT
___________ ____ ____ ____
01-May-0001 63 12 38.5
02-May-0001 70 28 48
03-May-0001 67 17 41
04-May-0001 61 22 39.5
05-May-0001 58 20 35
06-May-0001 63 18 39.5
07-May-0001 60 23 43.5
08-May-0001 60 20 35
09-May-0001 62 11 37.5
10-May-0001 64 12 39.5
11-May-0001 57 NaN NaN
12-May-0001 60 23 49
13-May-0001 66 30 45.5
14-May-0001 63 19 42
15-May-0001 61 16 39
16-May-0001 64 21 42.5
opts =
DelimitedTextImportOptions with properties:
Format Properties:
Delimiter: {'\t'}
Whitespace: '\b '
LineEnding: {'\n' '\r' '\r\n'}
CommentStyle: {}
ConsecutiveDelimitersRule: 'split'
LeadingDelimitersRule: 'keep'
TrailingDelimitersRule: 'ignore'
EmptyLineRule: 'skip'
Encoding: 'UTF-8'
Replacement Properties:
MissingRule: 'fill'
ImportErrorRule: 'fill'
ExtraColumnsRule: 'addvars'
Variable Import Properties: Set types by name using setvartype
VariableNames: {'Var1', 'Var2', 'Var3' ... and 6 more}
VariableTypes: {'double', 'char', 'double' ... and 6 more}
SelectedVariableNames: {'Var1', 'Var2', 'Var3' ... and 6 more}
VariableOptions: [1-by-9 matlab.io.VariableImportOptions]
Access VariableOptions sub-properties using setvaropts/getvaropts
VariableNamingRule: 'modify'
Location Properties:
DataLines: [2 Inf]
VariableNamesLine: 0
RowNamesColumn: 0
VariableUnitsLine: 0
VariableDescriptionsLine: 0
To display a preview of the table, use preview
Tbl = 31×4 table
Day MaxT MinT AvgT
___ ____ ____ ____
1 67 10 38.5
2 65 24 44
3 65 16 40.5
4 59 21 40
5 53 22 37
6 59 16 32.5
7 63 21 41
8 58 10 32
9 62 13 35.5
10 65 17 40
11 59 24 41
12 58 38 48
13 63 25 45.5
14 65 17 42
15 56 17 39
16 58 21 42.5
Tbl = 31×4 table
Day MaxT MinT AvgT
___ ____ ____ ____
1 67 10 38.5
2 65 24 44
3 65 16 40.5
4 59 21 40
5 53 22 37
6 59 16 32.5
7 63 21 41
8 58 10 32
9 62 13 35.5
10 65 17 40
11 59 24 41
12 58 38 48
13 63 25 45.5
14 65 17 42
15 56 17 39
16 58 21 42.5
year_str = '1983'
Tbl = 31×4 table
Day MaxT MinT AvgT
___________ ____ ____ ____
01-May-0001 67 10 38.5
02-May-0001 65 24 44
03-May-0001 65 16 40.5
04-May-0001 59 21 40
05-May-0001 53 22 37
06-May-0001 59 16 32.5
07-May-0001 63 21 41
08-May-0001 58 10 32
09-May-0001 62 13 35.5
10-May-0001 65 17 40
11-May-0001 59 24 41
12-May-0001 58 38 48
13-May-0001 63 25 45.5
14-May-0001 65 17 42
15-May-0001 56 17 39
16-May-0001 58 21 42.5
dataAll = 93×4 table
Day MaxT MinT AvgT
___________ ____ ____ ____
01-May-0001 65 12 38.5
02-May-0001 68 28 48
03-May-0001 65 17 41
04-May-0001 57 22 39.5
05-May-0001 46 24 35
06-May-0001 61 18 39.5
07-May-0001 62 25 43.5
08-May-0001 58 12 35
09-May-0001 64 11 37.5
10-May-0001 65 14 39.5
11-May-0001 54 22 38
12-May-0001 58 40 49
13-May-0001 64 27 45.5
14-May-0001 65 19 42
15-May-0001 59 19 39
16-May-0001 62 23 42.5
dataAll.Year = year(dataAll.Day);
dataAll.Month = month(dataAll.Day);
dataAll.DD = day(dataAll.Day)
dataAll = 93×7 table
Day MaxT MinT AvgT Year Month DD
___________ ____ ____ ____ ____ _____ __
01-May-0001 65 12 38.5 1 5 1
02-May-0001 68 28 48 1 5 2
03-May-0001 65 17 41 1 5 3
04-May-0001 57 22 39.5 1 5 4
05-May-0001 46 24 35 1 5 5
06-May-0001 61 18 39.5 1 5 6
07-May-0001 62 25 43.5 1 5 7
08-May-0001 58 12 35 1 5 8
09-May-0001 64 11 37.5 1 5 9
10-May-0001 65 14 39.5 1 5 10
11-May-0001 54 22 38 1 5 11
12-May-0001 58 40 49 1 5 12
13-May-0001 64 27 45.5 1 5 13
14-May-0001 65 19 42 1 5 14
15-May-0001 59 19 39 1 5 15
16-May-0001 62 23 42.5 1 5 16
% Unstack variables
minT_tbl = unstack(dataAll,"MinT","Year","GroupingVariables", ["Month","DD"],"VariableNamingRule","preserve")
minT_tbl = 31×3 table
Month DD 1
_____ __ ___
5 1 34
5 2 80
5 3 50
5 4 65
5 5 66
5 6 52
5 7 69
5 8 42
5 9 35
5 10 43
5 11 NaN
5 12 101
5 13 82
5 14 55
5 15 52
5 16 65
maxT_tbl = unstack(dataAll,"MaxT","Year","GroupingVariables", ["Month","DD"],"VariableNamingRule","preserve")
maxT_tbl = 31×3 table
Month DD 1
_____ __ ___
5 1 195
5 2 203
5 3 197
5 4 177
5 5 157
5 6 183
5 7 185
5 8 176
5 9 188
5 10 194
5 11 170
5 12 176
5 13 193
5 14 193
5 15 176
5 16 184
yrs =str2double(minT_tbl.Properties.VariableNames(3:end))';
% find min
[minT,idxMn] = min(minT_tbl{:,3:end},[],2);
minT_yr = yrs(idxMn);
% find max
[maxT,idxMx] = max(maxT_tbl{:,3:end},[],2);
maxT_yr = yrs(idxMx);
% find low high
[lowMaxT,idxMx] = min(maxT_tbl{:,3:end},[],2);
LowMaxT_yr = yrs(idxMx);
% find high low
[highlowMnT,idxMn] = max(minT_tbl{:,3:end},[],2);
HighLowT_yr = yrs(idxMn);
% find avg high
AvgMxT = mean(table2array(maxT_tbl,2));
% find avg low
AvgMnT = mean(table2array(maxT_tbl,2));
% Results
tempTbl = [maxT_tbl(:,["Month","DD"]), table(maxT,maxT_yr,AvgMxT,lowMaxT,LowMaxT_yr,minT,minT_yr,AvgMnT,highlowMnT,HighLowT_yr)]
Error using table
All table variables must have the same number of rows.
All table variables must have the same number of rows.
tempTbl2 = splitvars(tempTbl)
writetable(tempTbl2,'Meda 05 Temperature Climatology.txt','Delimiter',' ')
type 'Meda 05 Temperature Climatology.txt'
function Tbl = readMonth(filename)
opts = detectImportOptions(filename)
opts.ConsecutiveDelimitersRule = 'join';
opts.MissingRule = 'omitvar';
opts = setvartype(opts,'double');
opts.VariableNames = ["Day","MaxT","MinT","AvgT"];
Tbl = readtable(filename,opts)
Tbl = standardizeMissing(Tbl,{999,'N/A'},"DataVariables",{'MaxT','MinT','AvgT'})
[~,basename] = fileparts(filename);
nameparts = regexp(basename, '\.', 'split');
dateparts = regexp(nameparts{end}, '_','split');
year_str = dateparts{end}
d = str2double(extract(filename,digitsPattern));
Tbl.Day = datetime(d(3),d(2),Tbl.Day)
end
Cris LaPierre
2024-1-4
编辑:Cris LaPierre
2024-1-4
Inspect the variables in the line of code that is giving you an error. You will find that AvgMnT and AvgMxT are 1x5 vectors, while all the other variables are 31x1. Hence, not all your table variables have the same number of rows.
It looks like you may have made a typo when adding table2array to your code.
I recommend Inspecting the tables you are taking the mean of. First, you will likely find another typo. Second, I don't think your code is producing the result you think it is.
Jonathon Klepatzki
2024-1-4
Haha, I just saw the typo. I just ran it and it created a text file. I'm going to verify the data and get back to you with the final code IF it's good to go. Thank you again, and Happy New Year.
Jonathon Klepatzki
2024-1-4
编辑:Cris LaPierre
2024-1-5
@Cris LaPierre verified results (all good!)...thank you again!
**2020 version Final Code***
close all;
clear all;
clc;
Datafiles = fileDatastore("temp_summary*.txt","ReadFcn",@readMonth,"UniformRead",true);
dataAll = readall(Datafiles)
dataAll.Year = year(dataAll.Day);
dataAll.Month = month(dataAll.Day);
dataAll.DD = day(dataAll.Day)
% Unstack variables
minT_tbl = unstack(dataAll,"MinT","Year","GroupingVariables",["Month","DD"],"VariableNamingRule","preserve")
maxT_tbl = unstack(dataAll,"MaxT","Year","GroupingVariables",["Month","DD"],"VariableNamingRule","preserve")
yrs =str2double(minT_tbl.Properties.VariableNames(3:end))';
% find min
[minT,idxMn] = min(minT_tbl{:,3:end},[],2);
minT_yr = yrs(idxMn);
% find max
[maxT,idxMx] = max(maxT_tbl{:,3:end},[],2);
maxT_yr = yrs(idxMx);
% find low high
[lowMaxT,idxMx] = min(maxT_tbl{:,3:end},[],2);
LowMaxT_yr = yrs(idxMx);
% find high low
[highlowMnT,idxMn] = max(minT_tbl{:,3:end},[],2);
HighLowT_yr = yrs(idxMn);
% find avg high
AvgMxT = mean(table2array(maxT_tbl),2)
% find avg low
AvgMnT = mean(table2array(minT_tbl),2);
% Results
tempTbl = [maxT_tbl(:,["Month","DD"]), table(maxT,maxT_yr,AvgMxT,lowMaxT,LowMaxT_yr,minT,minT_yr,AvgMnT,highlowMnT,HighLowT_yr)]
tempTbl2 = splitvars(tempTbl)
writetable(tempTbl2,'Meda 05 Temperature Climatology.txt','Delimiter',' ')
type 'Meda 05 Temperature Climatology.txt'
function Tbl = readMonth(filename)
opts = detectImportOptions(filename)
opts.ConsecutiveDelimitersRule = 'join';
opts.MissingRule = 'omitvar';
opts = setvartype(opts,'double');
opts.VariableNames = ["Day","MaxT","MinT","AvgT"];
Tbl = readtable(filename,opts)
Tbl = standardizeMissing(Tbl,{999,'N/A'},"DataVariables",{'MaxT','MinT','AvgT'})
[~,basename] = fileparts(filename);
nameparts = regexp(basename, '\.', 'split');
dateparts = regexp(nameparts{end}, '_','split');
year_str = dateparts{end}
d = str2double(extract(filename,digitsPattern));
Tbl.Day = datetime(d(3),d(2),Tbl.Day)
end
Cris LaPierre
2024-1-5
I still think there is an logic error in your code, but if you are happy, that's all that matters.
From my earlier comment "I recommend Inspecting the tables you are taking the mean of. ... I don't think your code is producing the result you think it is."
AvgMxT = mean(table2array(maxT_tbl),2)
Cris LaPierre
2024-1-5
Also, for robustness, I recommend updating the following line of code to use basename instead of filename.
d = str2double(extract(filename,digitsPattern));
Jonathon Klepatzki
2024-1-5
编辑:Jonathon Klepatzki
2024-1-5
Thank you for the tip, I'll make the changes.
I actually do need to make a change to the code that calculates the mean. I need to have it down to a tenth of a degree in my final result. Another issue that came up is, with my larger datasets, it is showing a large number of NaN values for the average calculations and therefore should be not used in the calculation.
Any suggestions?
Cris LaPierre
2024-1-5
I don't think you are looking at the table you are taking the mean of. Here's why I think you want to inspect it. I doubt you want to include month and day in your calculation of mean temperature.
About ignoring NaNs in your mean calculation, you can use the following syntax
Jonathon Klepatzki
2024-1-5
Oh wow, thank you. That's a good catch. I'll see what I can do and get back to you.
Jonathon Klepatzki
2024-1-5
编辑:Jonathon Klepatzki
2024-1-5
So, I made adjustments to the mean calculation and basename code(s). They both work fine. Please see below the mean calculation codes. I did look at the AvgMnT and AvgMxT tables.
% find avg high
AvgMxT = round(mean(table2array(maxT_tbl(:,3:end),'omitnan'),2));
% find avg low
AvgMnT = round(mean(table2array(minT_tbl(:,3:end),'omitnan'),2));
It's still creating NaN in the output text file (see attached image). How can I realign the columns, any suggestions?
Cris LaPierre
2024-1-6
编辑:Cris LaPierre
2024-1-6
One of your closing parentheses is in the wrong place. 'omitnan'is an input to mean, not table2array.
Jonathon Klepatzki
2024-1-10
编辑:Jonathon Klepatzki
2024-1-10
I finally figured it out. It wasn't the parentheses. 'omitnan' was in the wrong spot.
This is the correct method.
% find avg high
AvgMxT = round(mean(table2array(maxT_tbl(:,3:end)),2,'omitnan'));
% find avg low
AvgMnT = round(mean(table2array(minT_tbl(:,3:end)),2,'omitnan'));
Jonathon Klepatzki
2024-2-1
Is there a way we can adjust the code to ignore February 29th? This is causing issue with the dates in the outgoing text file.
Cris LaPierre
2024-2-1
Jonathon Klepatzki
2024-2-5
@Cris LaPierre, I tried eomday and unfortunately it doesn't work. The leapyear function isn't part of the 2020b version that I am using.
Reason for eomday, "Y must be an integer year number".
This is the way that I tried it.
E = eomday(dataAll.Year,2)
LY = dataAll.Year(E == 29) == []
Cris LaPierre
2024-2-5
Why do these new files have asteriscs and pound signs in them? That is causing errors with the import.
Jonathon Klepatzki
2024-2-5
编辑:Jonathon Klepatzki
2024-2-5
Those are indications of tieing or setting new records of that specific day and month. Unfortunately, I cannot change them.
Cris LaPierre
2024-2-5
编辑:Cris LaPierre
2024-2-5
Does your code not throw an error when loading these new files? It does for me.
They were not present in any of the previous sample data files provided. The * or # is getting read in as the next temp, and then everything after it is bumped one column to the right, meaning the MinT is now read in with the AvgT, etc.
readtable cannot correctly handle these characters. I think you need to redo how the files are read in, unless you've come up with a strategy that handles this already. Note the extra column here
filename = 'temp_summary.05.02_1981.txt';
opts = detectImportOptions(filename);
opts.ConsecutiveDelimitersRule = 'join';
opts.MissingRule = 'omitvar';
opts = setvartype(opts,'double');
opts.VariableNames = ["Day","MaxT","MinT","AvgT"];
Tbl = readtable(filename,opts)
Tbl = 29×5 table
Day MaxT MinT AvgT ExtraVar1
___ ____ ____ ____ __________
1 57 12 34.5 {0×0 char}
2 58 19 38.5 {0×0 char}
3 59 17 37 {0×0 char}
4 59 12 35.5 {0×0 char}
5 50 12 NaN {'31.5' }
6 54 31 42.5 {0×0 char}
7 60 13 36.5 {0×0 char}
8 51 29 40 {0×0 char}
9 52 36 44 {0×0 char}
10 59 24 41.5 {0×0 char}
11 61 36 48.5 {0×0 char}
12 67 28 46.5 {0×0 char}
13 63 21 42 {0×0 char}
14 63 29 46 {0×0 char}
15 70 26 47 {0×0 char}
16 72 29 51.5 {0×0 char}
Jonathon Klepatzki
2024-2-5
编辑:Cris LaPierre
2024-2-5
I am not receiving any kind of error (outside trying to find the Leapyear). However, February 29th and March 1st flip (meaning March 1st comes before the 29th). A few days do have the wrong temperature. It's entirely possible that the samples provided on the 1st are bad because the files that I have to use include * and #.
Below is the code that I am using right now. It's been modified slightly.
close all;
clear all;
clc;
Datafiles = fileDatastore("temp_summary*.txt","ReadFcn",@readMonth,"UniformRead",true);
dataAll = readall(Datafiles)
Error using matlab.io.datastore.FileDatastore/readall
All tables being vertically concatenated must have the same number of variables.
All tables being vertically concatenated must have the same number of variables.
dataAll.Year = year(dataAll.Day);
dataAll.Month = month(dataAll.Day);
dataAll.DD = day(dataAll.Day)
% Unstack variables
minT_tbl = unstack(dataAll,"MinT","Year","GroupingVariables", ["Month","DD"],"VariableNamingRule","preserve")
maxT_tbl = unstack(dataAll,"MaxT","Year","GroupingVariables", ["Month","DD"],"VariableNamingRule","preserve")
yrs =str2double(minT_tbl.Properties.VariableNames(3:end))';
% find min
[Tmin,idxMn] = min(minT_tbl{:,3:end},[],2,'omitnan');
Tmin_yr = yrs(idxMn);
% find max
[Tmax,idxMx] = max(maxT_tbl{:,3:end},[],2,'omitnan');
Tmax_yr = yrs(idxMx);
% find low high
[lowTMax,idxMx] = min(maxT_tbl{:,3:end},[],2,'omitnan');
LowTMax_yr = yrs(idxMx);
% find high low
[highlowTMn,idxMn] = max(minT_tbl{:,3:end},[],2,'omitnan');
HighLowT_yr = yrs(idxMn);
% find avg high
AvgTMx = round(mean(table2array(maxT_tbl(:,3:end)),2,'omitnan'));
% find avg low
AvgTMn = round(mean(table2array(minT_tbl(:,3:end)),2,'omitnan'));
% Results
tempTbl = [maxT_tbl(:,["Month","DD"]), table(Tmax,Tmax_yr,AvgTMx,lowTMax,LowTMax_yr,Tmin,Tmin_yr,AvgTMn,highlowTMn,HighLowT_yr)]
tempTbl2 = splitvars(tempTbl)
FID = fopen('Meda 05 Temperature Climatology.txt','w');
report_date = datetime('now','format','yyyy-MM-dd HH:MM');
fprintf(FID,'Meda 05 Temperature Climatology at %s \n', report_date);
fprintf(FID,"Month DD Temp Max (°F) Tmax_yr AvgTMax (°F) lowTMax (°F) LowTMax_yr TempMin (°F) TMin_yr AvgTMin (°F) HighlowTMin (°F) HighlowT_yr \n");
fprintf(FID,'%3d %6d %7d %14d %11d %11d %15d %11d %13d %10d %13d %17d \n', tempTbl2{:,1:end}');
fclose(FID);
winopen('Meda 05 Temperature Climatology.txt')
function Tbl = readMonth(filename)
opts = detectImportOptions(filename);
opts.ConsecutiveDelimitersRule = 'join';
opts.MissingRule = 'omitvar';
opts = setvartype(opts,'double');
opts.VariableNames = ["Day","MaxT","MinT","AvgT"];
Tbl = readtable(filename,opts);
Tbl = standardizeMissing(Tbl,{999,'N/A'},"DataVariables",{'MaxT','MinT','AvgT'});
Tbl = standardizeMissing(Tbl,{-99,'N/A'},"DataVariables",{'MaxT','MinT','AvgT'});
[~,basename] = fileparts(filename);
nameparts = regexp(basename, '\.', 'split');
dateparts = regexp(nameparts{end}, '_','split');
year_str = dateparts{end};
d = str2double(extract(filename,digitsPattern));
Tbl.Day = datetime(d(3),d(2),Tbl.Day);
end
Cris LaPierre
2024-2-5
编辑:Cris LaPierre
2024-2-5
There is an error in the code you shared about mismatched number of variables. I had to suppress all the output in your code to make it more obvious.
Note that none of the files you shared is a leap year. Can you share a file that is a leap year as well?
Jonathon Klepatzki
2024-2-5
编辑:Jonathon Klepatzki
2024-2-5
I was able to figure out the necessary lines of code to eliminate LY. See below. However, I would like to continue discussing about the error you mentioned. On my end, (see attached file) it gives me a max temperature of 157 on March 1st. I wonder if the * or # you mentioned earlier is the culprit behind this.
By inserting the following code below dataAll.DD = day(dataAll.Day), it eliminated the LY issue.
LY = (dataAll.Month(:)==2 & dataAll.DD(:)==29);
dataAll(LY,:) = [ ];
Cris LaPierre
2024-2-5
编辑:Cris LaPierre
2024-2-6
It appears all your February files have data for the 29th, but only 2016 is a leap year. This is leading to 2 rows of data for Mar 1 in 1981 and 1982.
Cris LaPierre
2024-2-6
I do not have access to the data for that month & year so can't investigate why that is happening. However, you should learn how to do this. I propose the following steps
- Inspect the raw text file. Is there a temp of 157 in March 1999?
- Inspect the dataAll table. Is there a temp of 157 for March 1 1999?
- Inspect the minT_tbl table. Is there a temp of 157 in the row for March 1?
Once you've figured out the source of the 157, next determine why it is being selected, or perhaps figure out what the correct value should be. This might mean looking at the raw files, etc. Once you know what the correct value shoudl be, and where the incorrect values are appearing, you can look at the relevant code and figure out what you need to change to fix it.
更多回答(0 个)
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 Data Preprocessing 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!发生错误
由于页面发生更改,无法完成操作。请重新加载页面以查看其更新后的状态。
您也可以从以下列表中选择网站:
如何获得最佳网站性能
选择中国网站(中文或英文)以获得最佳网站性能。其他 MathWorks 国家/地区网站并未针对您所在位置的访问进行优化。
美洲
- América Latina (Español)
- Canada (English)
- United States (English)
欧洲
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom(English)
亚太
- Australia (English)
- India (English)
- New Zealand (English)
- 中国
- 日本Japanese (日本語)
- 한국Korean (한국어)