Extracting Time Data from Text File

In short, I've been trying to put together a script that scans a .txt file and pulls number values from it. The context is Rubik's Cube times; I have a program that produces .txt files from sessions, and the contents include somee basic statistics at the top, a list of times, and the moves used to scramble the cube for each time. I've included an example of the beginning of one of these files below:
Average: 10.05
Best: 9.34
Worst: 14.26
Mean: 10.75
Standard Deviation: 1.79
1: 9.91 D2 L2 B2 L2 F2 D L2 B2 D' U L' U' L D' L F' R' B' U B2
2: 9.86 R2 U2 B2 F2 D L2 D' U' R2 U' B2 R U F' U' B' D B2 U R' D2 F2
3: (14.26) D2 L2 B' L2 B2 D2 L2 F2 D2 F R' D B R2 B' D' R2 B2 F L
4: (9.34) B F U2 B R2 B' U2 F' D2 R' D' B2 D' B' U2 R' B' D' B'
5: 10.39 B F' U2 B2 L2 U2 R2 B' R2 B U B2 R2 D2 R2 B' L B R D2
My goal is to pull the times from these files to run my own stats on them, but there are a lot of nuances that have made this difficult!
The biggest issues I've had:
1) As you can see, some of the times contain 3 sig figs, and some contain 4. I haven't found a clean way to pull both from the file at the same time.
2) I haven't found a way to ignore the statistics at the top (if this is too much for a script, it would be easy to just delete that section of the text file before I run the script.
What I have so far isn't pretty, but so far this has successfully created an array of the 4-digit times for me:
fileID = fopen("txt Files/text-4AB30AA43EF2-1.txt")
raw = fscanf(fileID,"%c")
fclose(fileID)
num = '[0123456789][0123456789].[0123456789][0123456789]';
out = regexp(raw,num,'match');
gen1 = str2double(out);
list = gen1'
Any help would be greatly appreciated, thank you!

 采纳的回答

Simpler and more efficient:
str = fileread('test_1.txt');
tkn = regexp(str,'^\s*(\d+):\D+(\d+\.?\d*)\W+([^\n\r]+)', 'tokens', 'lineanchors');
tkn = vertcat(tkn{:})
tkn = 5×3 cell array
{'1'} {'9.91' } {'D2 L2 B2 L2 F2 D L2 B2 D' U L' U' L D' L F' R' B' U B2' } {'2'} {'9.86' } {'R2 U2 B2 F2 D L2 D' U' R2 U' B2 R U F' U' B' D B2 U R' D2 F2'} {'3'} {'14.26'} {'D2 L2 B' L2 B2 D2 L2 F2 D2 F R' D B R2 B' D' R2 B2 F L' } {'4'} {'9.34' } {'B F U2 B R2 B' U2 F' D2 R' D' B2 D' B' U2 R' B' D' B'' } {'5'} {'10.39'} {'B F' U2 B2 L2 U2 R2 B' R2 B U B2 R2 D2 R2 B' L B R D2' }
mat = str2double(tkn(:,1:2))
mat = 5×2
1.0000 9.9100 2.0000 9.8600 3.0000 14.2600 4.0000 9.3400 5.0000 10.3900
Optional extras:
% tkn(:,1:2) = num2cell(mat); % optional, but it is much better to store numeric
% data in a numeric array, so probably best avoided.
spl = regexp(tkn(:,3),'\w+','match')
spl = 5×1 cell array
{1×20 cell} {1×22 cell} {1×20 cell} {1×19 cell} {1×20 cell}
spl{:}
ans = 1×20 cell array
{'D2'} {'L2'} {'B2'} {'L2'} {'F2'} {'D'} {'L2'} {'B2'} {'D'} {'U'} {'L'} {'U'} {'L'} {'D'} {'L'} {'F'} {'R'} {'B'} {'U'} {'B2'}
ans = 1×22 cell array
{'R2'} {'U2'} {'B2'} {'F2'} {'D'} {'L2'} {'D'} {'U'} {'R2'} {'U'} {'B2'} {'R'} {'U'} {'F'} {'U'} {'B'} {'D'} {'B2'} {'U'} {'R'} {'D2'} {'F2'}
ans = 1×20 cell array
{'D2'} {'L2'} {'B'} {'L2'} {'B2'} {'D2'} {'L2'} {'F2'} {'D2'} {'F'} {'R'} {'D'} {'B'} {'R2'} {'B'} {'D'} {'R2'} {'B2'} {'F'} {'L'}
ans = 1×19 cell array
{'B'} {'F'} {'U2'} {'B'} {'R2'} {'B'} {'U2'} {'F'} {'D2'} {'R'} {'D'} {'B2'} {'D'} {'B'} {'U2'} {'R'} {'B'} {'D'} {'B'}
ans = 1×20 cell array
{'B'} {'F'} {'U2'} {'B2'} {'L2'} {'U2'} {'R2'} {'B'} {'R2'} {'B'} {'U'} {'B2'} {'R2'} {'D2'} {'R2'} {'B'} {'L'} {'B'} {'R'} {'D2'}

5 个评论

A breakdown of that regular expression, the three token groups collect the desired data:
^\s*(\d+):\D+(\d+\.?\d*)\W+([^\n\r]+)
^ match the start of the line
\s* match zero or more whitespace (optional)
( ) 1st token group
\d+ match one or more digits
: match literal colon
\D+ match one or more non-digits
( ) 2nd token group
\d+\.?\d* match digits with optional decimal point
\W+ match non-word characters
( ) 3rd token group
[^\n\r]+ match one or more non-newlines
Wow, thank you so much Stephen! Definitely a more efficient route, and I really appreciate you splitting up the regular expression the way you did -- I'm still new to conversion characters and how they interact, you've passively taught me a lot here.
@Simon Montrose: if my explanation helped you please vote for my answer :)
Definitely worth voting for!
+1
@Stephen Of course, I didn't know I could switch them my apologies! Still getting acquainted with the forums, but I've implemented your suggestions and have the script completely up and running :) Cheers!

请先登录,再进行评论。

更多回答(0 个)

类别

帮助中心File Exchange 中查找有关 Standard File Formats 的更多信息

产品

版本

R2021b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by