Can I remove the date from big data set text file?

Question

Niki Wilson 2023-4-17

0
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/1948443-can-i-remove-the-date-from-big-data-set-text-file

评论： Niki Wilson 2023-4-20

time0.txt

I have a huge text file that contains data points from a laser. It reads as the date (as in the 04/12/2023) followed by the time (as in 15:43:42.225) and then the corresponding out put value (as in 0.7756). The problem I am running into is that the date can be read as the 12th of April or te 4th of December. After running the code that I have it throws this error message:

The DATETIME data was created using format 'MM/dd/uuuu HH:mm:ss.SSS' but also matched 'dd/MM/uuuu HH:mm:ss.SSS'.

To avoid ambiguity, supply a datetime format using SETVAROPTS, e.g.

opts = setvaropts(opts,varname,'InputFormat','MM/dd/uuuu HH:mm:ss.SSS');

I don't know how to use setvaropts, so I looked it up. However, all the code I have tested since hasn't worked. I get a lot of unknown variable messages. The thing is, I really don't care about the date in my data set. So, is there a way to completely ignore it so my code will run without getting stuck on that part?

This is my original code in case it is useful:

data = readtable('time0.txt');
Warning: The DATETIME data was created using format 'MM/dd/uuuu HH:mm:ss.SSS' but also matched 'dd/MM/uuuu HH:mm:ss.SSS'.
To avoid ambiguity, supply a datetime format using SETVAROPTS, e.g. 
  opts = setvaropts(opts,varname,'InputFormat','MM/dd/uuuu HH:mm:ss.SSS');
t0 = data{:,1};
y0 = data{:,2};
[pks,locs]=findpeaks(y0,"MinPeakProminence",2);
Average0 = mean(diff(locs));

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

Frederic Rudawski 2023-4-17

Maybe the answer of this question helps you:

https://de.mathworks.com/matlabcentral/answers/412075-how-to-use-setvartype-to-get-the-variable-as-datetime-formatted-as-yyyy-mm-dd

请先登录，再进行评论。

请先登录，再回答此问题。

Answer 1

dpb 2023-4-17

0
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/1948443-can-i-remove-the-date-from-big-data-set-text-file#answer_1217343

编辑：dpb 2023-4-17

在 MATLAB Online 中打开

Not easily, no you can't ignore the date because the file is tab delimited and the date/time is a single string. To ignore it also leaves you without the time. The help message showed you how to use setvartype, just follow the directions...

fn='https://www.mathworks.com/matlabcentral/answers/uploaded_files/1358713/time0.txt';

opts=detectImportOptions(fn) % create a default options object first; show what one looks like

opts =

DelimitedTextImportOptions with properties: Format Properties: Delimiter: {'\t'} Whitespace: '\b ' LineEnding: {'\n' '\r' '\r\n'} CommentStyle: {} ConsecutiveDelimitersRule: 'split' LeadingDelimitersRule: 'keep' TrailingDelimitersRule: 'ignore' EmptyLineRule: 'skip' Encoding: 'UTF-8' Replacement Properties: MissingRule: 'fill' ImportErrorRule: 'fill' ExtraColumnsRule: 'addvars' Variable Import Properties: Set types by name using setvartype VariableNames: {'Var1', 'Var2'} VariableTypes: {'datetime', 'double'} SelectedVariableNames: {'Var1', 'Var2'} VariableOptions: Show all 2 VariableOptions Access VariableOptions sub-properties using setvaropts/getvaropts VariableNamingRule: 'modify' Location Properties: DataLines: [1 Inf] VariableNamesLine: 0 RowNamesColumn: 0 VariableUnitsLine: 0 VariableDescriptionsLine: 0 To display a preview of the table, use preview

We observe it recognized first of the two columns as a datetime; there are only two variables, so it is tab-delimited and the time string was written as only the one string. We only have to eliminate the conundrum of which date format is the correct one. It presumed the month/day/year would be the more likely so gave that in the help message; if that's not correct, then swap those two. So, follow the instructions...

opts=setvaropts(opts,opts.VariableNames(1),'InputFormat','MM/dd/uuuu HH:mm:ss.SSSSSS'); 
tData=readtable(fn,opts);           % now read with the opts struct to tell it...
Warning: The server returned "15233" bytes, but "-1" were expected. The reason is "transfer closed with outstanding read data remaining";
head(tData)
               Var1                 Var2  
    __________________________    ________

    04/12/2023 15:42:59.575769     0.17703
    04/12/2023 15:42:59.575819     0.13004
    04/12/2023 15:42:59.575869    0.086921
    04/12/2023 15:42:59.575919    0.047983
    04/12/2023 15:42:59.575969    0.045087
    04/12/2023 15:42:59.576018    0.075336
    04/12/2023 15:42:59.576069     0.12747
    04/12/2023 15:42:59.576119     0.16737

And, if it's only a timeseries as looks like may be, then you can convert the datetime to a duration and have only the elapsed time which is probably what you're interested in...and make more user-friendly names besides--

tData.Properties.VariableNames={'Time','Response'};
tData.Time=tData.Time-tData.Time(1);

Which shows the time-of-day format isn't particularly useful for this purpose so

tData.Time.Format='mm:ss.SSSSSS';
head(tData)
        Time        Response
    ____________    ________

    00:00.000000     0.17703
    00:00.000049     0.13004
    00:00.000099    0.086921
    00:00.000149    0.047983
    00:00.000199    0.045087
    00:00.000249    0.075336
    00:00.000300     0.12747
    00:00.000350     0.16737

The duration output formatting is pretty weak; while the actual durations are stored with full precision, how to look at them is quite restrictive; never could figure out why TMW did that. It might be more convenient to convert to microsecs...or, if the data were sampled with a fixed A/D sample rate and not free-run, then as you say, just forget the first variable entirely and use/generate your time vector from the sampling rate and number of samples.

That's easy enough to just ignore if want to go that direction; in that case in the opts object, just say

opts.VariableNames(2)={'Response'};                 % set the name here
opts.SelectedVariableNames=opts.VariableNames(2);   % read it only
tData=readtable(fn,opts);           % now read with the opts struct to tell it...
head(tData)
    Response
    ________

     0.17703
     0.13004
    0.086921
    0.047983
    0.045087
    0.075336
     0.12747
     0.16737

Now you don't have to tell it what the datetime format is; it's ignored so doesn't matter...

And, that's all there is to using setvaropts! <VBG>

3 个评论
显示 1更早的评论隐藏 1更早的评论

Niki Wilson 2023-4-20

在 MATLAB Online 中打开

time0.txt

Hello,

Thank you for your help. I tried the code you suggested and I'm running into something of a major issue. maybe you can't help me because it might have nothing to do with setvaropts, but in case you can this is what my code looks like. I'm not ultimately getting a return value. It runs fine, but pks and locs are empty; therefore, Average0 can't compute. I've messed with it for a while, but it doesn't make any sense as to why it's not running.

data1 = 'time0.txt';
opts = detectImportOptions(data1);
opts = setvaropts(opts,opts.VariableNames(1),'InputFormat','MM/dd/uuu HH:mm:ss.SSSSSS');
data = readtable(data1,opts);
data.Properties.VariableNames = {'Time','Response'};
data.Time = data.Time - data.Time(1);
data.Time.Format = 'mm:ss.SSSSSS';
% opts.VariableNames(2) = {'Response'};
% opts.SelectedVariableNames = opts.VariableNames(2);
% data = readtable(data1,opts);
% t0 = data(:,1);
% t0 = table2array(t0);
% t0 = seconds(t0);
y0 = data(:,2);
y0 = table2array(y0);
[pks,locs]=findpeaks(y0,"MinPeakProminence",2);
Average0 = mean(diff(locs))
Average0 = NaN

dpb 2023-4-20

编辑：dpb 2023-4-20

在 MATLAB Online 中打开

fn='https://www.mathworks.com/matlabcentral/answers/uploaded_files/1361888/time0.txt';

opts = detectImportOptions(fn);

opts = setvaropts(opts,opts.VariableNames(1),'InputFormat','MM/dd/uuu HH:mm:ss.SSSSSS');

tData = readtable(fn,opts);

tData.Properties.VariableNames = {'Time','Response'};

tData.Time = tData.Time - tData.Time(1);

tData.Time.Format = 'mm:ss.SSSSSS';

subplot(2,1,1)

findpeaks(tData.Response);

xlim([0 250])

subplot(2,1,2)

findpeaks(tData.Response,"MinPeakProminence",2);

xlim([0 250])

There's no magnitude of that size in the data; you've screened all of it out.

Once you have the data in the table, use it; there's no reason to create more copies of it in some other fashion. You'll also not I reverted back to the tData variable to refresh memory that it is a table

Moral: ALWAYS PLOT YOUR DATA FIRST!!!!

Niki Wilson 2023-4-20

Thank you SO much. Cheers!

请先登录，再进行评论。

Can I remove the date from big data set text file?

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

采纳的回答

3 个评论
显示 1更早的评论隐藏 1更早的评论

更多回答（0 个）

另请参阅

类别

标签

Community Treasure Hunt

Can I remove the date from big data set text file?

1 个评论 显示 -1更早的评论隐藏 -1更早的评论

采纳的回答

3 个评论 显示 1更早的评论隐藏 1更早的评论

更多回答（0 个）

另请参阅

类别

标签

Community Treasure Hunt

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

3 个评论
显示 1更早的评论隐藏 1更早的评论