How to turn .txt file into a useful table.
49 次查看(过去 30 天)
显示 更早的评论
This seems like it should be exceedingly simple, but I haven't found anything on here or anywhere else that addresses it. I have a text file delimited by periods that should be very easy to import using the readtable function, but it seems that readtable automatically sets everything to be character arrays. I've tried using format strings, but I get errors. I would include my code but it's simply one line, one fuction: readtable(filepath).
Trying to include a format string gets me:
"Unable to read the entire file. You may need to specify
a different format, delimiter, or number of header
lines.
Note: readtable detected the following parameters:
'HeaderLines', 0, 'ReadVariableNames', true
Error in redditAnalysis (line 4)
allData =
readtable('C:\Users\John\Desktop\ChildrensNeurobio\MATLABproject\redditPractice\all.txt',
'Delimiter', '.', 'Format', '%f%f%f%f%f%s');
"
Any idea how to get the columns I need into a useful numeric vector format?
EDIT: the first few lines of the file... rank.page.upvotes.comments.age.subreddit
1.1.40400.1283.3.OldSchoolCool
2.1.19200.906.4.funny
3.1.31800.1709.5.politics
4.1.40300.780.5.bestof
5.1.5844.1277.3.soccer
6.1.30200.256.5.aww
0 个评论
回答(2 个)
Sailesh Sidhwani
2017-8-30
To achieve your workflow, along with the file you should all pass "File Import Options" to the readtable() functio. These options define how the file will be read in MATLAB. You can also set the variable names, variable types and delimiter in these import options. To know more about import options, check the documentation link below:
See the following steps to achieve your workflow. "abc.txt" is the subset of your file from your question.
opts = detectImportOptions('abc.txt')
opts =
DelimitedTextImportOptions with properties:
Format Properties:
Delimiter: {','}
Whitespace: '\b\t '
LineEnding: {'\n' '\r' '\r\n'}
CommentStyle: {}
ConsecutiveDelimitersRule: 'split'
LeadingDelimitersRule: 'keep'
EmptyLineRule: 'skip'
Encoding: 'windows-1252'
Replacement Properties:
MissingRule: 'fill'
ImportErrorRule: 'fill'
ExtraColumnsRule: 'addvars'
Variable Import Properties: Set types by name using setvartype
VariableNames: {'rank_page_upvotes_comments_age_subreddit'}
VariableTypes: {'char'}
SelectedVariableNames: {'rank_page_upvotes_comments_age_subreddit'}
VariableOptions: Show all 1 VariableOptions
Access VariableOptions sub-properties using setvaropts/getvaropts
Now change the delimiter, variableNames and variableTypes as per your requirement.
opts.Delimiter = {'.'};
opts.VariableNames= {'rank','page','upvotes','comments','age','subreddit'}
opts =
DelimitedTextImportOptions with properties:
Format Properties:
Delimiter: {'.'}
Whitespace: '\b\t '
LineEnding: {'\n' '\r' '\r\n'}
CommentStyle: {}
ConsecutiveDelimitersRule: 'split'
LeadingDelimitersRule: 'keep'
EmptyLineRule: 'skip'
Encoding: 'windows-1252'
Replacement Properties:
MissingRule: 'fill'
ImportErrorRule: 'fill'
ExtraColumnsRule: 'addvars'
Variable Import Properties: Set types by name using setvartype
VariableNames: {'rank', 'page', 'upvotes' ... and 3 more}
VariableTypes: {'char', 'char', 'char' ... and 3 more}
SelectedVariableNames: {'rank', 'page', 'upvotes' ... and 3 more}
VariableOptions: Show all 6 VariableOptions
Access VariableOptions sub-properties using setvaropts/getvaropts
Now pass this "opts" as File Import Options to "readtable"
readtable('abc.txt',opts)
ans =
6×6 table
rank page upvotes comments age subreddit
____ ____ _______ ________ ___ _______________
'1' '1' '40400' '1283' '3' 'OldSchoolCool'
'2' '1' '19200' '906' '4' 'funny'
'3' '1' '31800' '1709' '5' 'politics'
'4' '1' '40300' '780' '5' 'bestof'
'5' '1' '5844' '1277' '3' 'soccer'
'6' '1' '30200' '256' '5' 'aww'
1 个评论
Jeremy Hughes
2017-8-31
编辑:Jeremy Hughes
2017-8-31
you can also set the types with:
>> opts = setvartype(opts,1:5,'double');
See my full answer for a slightly better approach.
Jeremy Hughes
2017-8-31
编辑:Jeremy Hughes
2017-8-31
Hi,
This is actually pretty simple:
>> opts = detectImportOptions('abc.txt','Delimiter','.')
>> opts.VariableNames= {'rank','page','upvotes','comments','age','subreddit'}
>> t = readtable('abc.txt',opts);
Without import options, readtable uses a slightly different reading method that scans for numbers and thus pulls the '.' (i.e. decimal point) along for the ride. Without the 'Delimiter' parameter, detectImportOptions will not choose '.' since it assumes the value will appear as a decimal separator.
Hope this helps,
Jeremy
1 个评论
Jeremy Hughes
2017-8-31
And if the variable names are already there in the file, you might not need that second line.
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 Logical 的更多信息
产品
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!