Can I remove the date from big data set text file?

5 次查看(过去 30 天)
I have a huge text file that contains data points from a laser. It reads as the date (as in the 04/12/2023) followed by the time (as in 15:43:42.225) and then the corresponding out put value (as in 0.7756). The problem I am running into is that the date can be read as the 12th of April or te 4th of December. After running the code that I have it throws this error message:
The DATETIME data was created using format 'MM/dd/uuuu HH:mm:ss.SSS' but also matched 'dd/MM/uuuu HH:mm:ss.SSS'.
To avoid ambiguity, supply a datetime format using SETVAROPTS, e.g.
opts = setvaropts(opts,varname,'InputFormat','MM/dd/uuuu HH:mm:ss.SSS');
I don't know how to use setvaropts, so I looked it up. However, all the code I have tested since hasn't worked. I get a lot of unknown variable messages. The thing is, I really don't care about the date in my data set. So, is there a way to completely ignore it so my code will run without getting stuck on that part?
This is my original code in case it is useful:
data = readtable('time0.txt');
Warning: The DATETIME data was created using format 'MM/dd/uuuu HH:mm:ss.SSS' but also matched 'dd/MM/uuuu HH:mm:ss.SSS'.
To avoid ambiguity, supply a datetime format using SETVAROPTS, e.g.
opts = setvaropts(opts,varname,'InputFormat','MM/dd/uuuu HH:mm:ss.SSS');
t0 = data{:,1};
y0 = data{:,2};
[pks,locs]=findpeaks(y0,"MinPeakProminence",2);
Average0 = mean(diff(locs));

采纳的回答

dpb
dpb 2023-4-17
编辑:dpb 2023-4-17
Not easily, no you can't ignore the date because the file is tab delimited and the date/time is a single string. To ignore it also leaves you without the time. The help message showed you how to use setvartype, just follow the directions...
fn='https://www.mathworks.com/matlabcentral/answers/uploaded_files/1358713/time0.txt';
opts=detectImportOptions(fn) % create a default options object first; show what one looks like
opts =
DelimitedTextImportOptions with properties: Format Properties: Delimiter: {'\t'} Whitespace: '\b ' LineEnding: {'\n' '\r' '\r\n'} CommentStyle: {} ConsecutiveDelimitersRule: 'split' LeadingDelimitersRule: 'keep' TrailingDelimitersRule: 'ignore' EmptyLineRule: 'skip' Encoding: 'UTF-8' Replacement Properties: MissingRule: 'fill' ImportErrorRule: 'fill' ExtraColumnsRule: 'addvars' Variable Import Properties: Set types by name using setvartype VariableNames: {'Var1', 'Var2'} VariableTypes: {'datetime', 'double'} SelectedVariableNames: {'Var1', 'Var2'} VariableOptions: Show all 2 VariableOptions Access VariableOptions sub-properties using setvaropts/getvaropts VariableNamingRule: 'modify' Location Properties: DataLines: [1 Inf] VariableNamesLine: 0 RowNamesColumn: 0 VariableUnitsLine: 0 VariableDescriptionsLine: 0 To display a preview of the table, use preview
We observe it recognized first of the two columns as a datetime; there are only two variables, so it is tab-delimited and the time string was written as only the one string. We only have to eliminate the conundrum of which date format is the correct one. It presumed the month/day/year would be the more likely so gave that in the help message; if that's not correct, then swap those two. So, follow the instructions...
opts=setvaropts(opts,opts.VariableNames(1),'InputFormat','MM/dd/uuuu HH:mm:ss.SSSSSS');
tData=readtable(fn,opts); % now read with the opts struct to tell it...
Warning: The server returned "15233" bytes, but "-1" were expected. The reason is "transfer closed with outstanding read data remaining";
head(tData)
Var1 Var2 __________________________ ________ 04/12/2023 15:42:59.575769 0.17703 04/12/2023 15:42:59.575819 0.13004 04/12/2023 15:42:59.575869 0.086921 04/12/2023 15:42:59.575919 0.047983 04/12/2023 15:42:59.575969 0.045087 04/12/2023 15:42:59.576018 0.075336 04/12/2023 15:42:59.576069 0.12747 04/12/2023 15:42:59.576119 0.16737
And, if it's only a timeseries as looks like may be, then you can convert the datetime to a duration and have only the elapsed time which is probably what you're interested in...and make more user-friendly names besides--
tData.Properties.VariableNames={'Time','Response'};
tData.Time=tData.Time-tData.Time(1);
Which shows the time-of-day format isn't particularly useful for this purpose so
tData.Time.Format='mm:ss.SSSSSS';
head(tData)
Time Response ____________ ________ 00:00.000000 0.17703 00:00.000049 0.13004 00:00.000099 0.086921 00:00.000149 0.047983 00:00.000199 0.045087 00:00.000249 0.075336 00:00.000300 0.12747 00:00.000350 0.16737
The duration output formatting is pretty weak; while the actual durations are stored with full precision, how to look at them is quite restrictive; never could figure out why TMW did that. It might be more convenient to convert to microsecs...or, if the data were sampled with a fixed A/D sample rate and not free-run, then as you say, just forget the first variable entirely and use/generate your time vector from the sampling rate and number of samples.
That's easy enough to just ignore if want to go that direction; in that case in the opts object, just say
opts.VariableNames(2)={'Response'}; % set the name here
opts.SelectedVariableNames=opts.VariableNames(2); % read it only
tData=readtable(fn,opts); % now read with the opts struct to tell it...
head(tData)
Response ________ 0.17703 0.13004 0.086921 0.047983 0.045087 0.075336 0.12747 0.16737
Now you don't have to tell it what the datetime format is; it's ignored so doesn't matter...
And, that's all there is to using setvaropts! <VBG>
  3 个评论
dpb
dpb 2023-4-20
编辑:dpb 2023-4-20
fn='https://www.mathworks.com/matlabcentral/answers/uploaded_files/1361888/time0.txt';
opts = detectImportOptions(fn);
opts = setvaropts(opts,opts.VariableNames(1),'InputFormat','MM/dd/uuu HH:mm:ss.SSSSSS');
tData = readtable(fn,opts);
tData.Properties.VariableNames = {'Time','Response'};
tData.Time = tData.Time - tData.Time(1);
tData.Time.Format = 'mm:ss.SSSSSS';
subplot(2,1,1)
findpeaks(tData.Response);
xlim([0 250])
subplot(2,1,2)
findpeaks(tData.Response,"MinPeakProminence",2);
xlim([0 250])
There's no magnitude of that size in the data; you've screened all of it out.
Once you have the data in the table, use it; there's no reason to create more copies of it in some other fashion. You'll also not I reverted back to the tData variable to refresh memory that it is a table
Moral: ALWAYS PLOT YOUR DATA FIRST!!!!

请先登录,再进行评论。

更多回答(0 个)

类别

Help CenterFile Exchange 中查找有关 Data Distribution Plots 的更多信息

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by