Import text files with character and numeric data
显示 更早的评论
Hello, I have the following text file (please find attached). I want to import it into matlab and I need only numeric data. The text is not required. I tried this using the import function in matlab. The problem I have is the number of columns are not known and keeps on changing. So the generated code is not working when the number of columns change. How can I import the data with any number of columns and rows. Moreover, the data file I attached is a smaller version. The number of rows in original data file goes over 3 million. How can I import the text file of this type as fast as possible ?
Thank you.
采纳的回答
s=importdata('file.txt')
data=s.data
text=s.textdata
colheaders=s.colheaders
9 个评论
Thanks for the response. How can I extract the numbers associated with the result "text".
I doubt that it can work this way. If you need to extract the array of numbers only, you can do it this way:
fId = fopen( 'Raw.txt', 'r' ) ;
data = textscan( fId, '%f %f %f', 'HeaderLines', 22 ) ;
fclose( fId ) ;
Then if you prefer to deal with a numeric array instead of a cell array of columns:
data = horzcat( data{:} ) ;
Now if you also need the numbers associated with the parameters from the header, one way to do it is to use a regular expression:
% - Similar to what we did above, but we get the file content in
% a string buffer.
content = fileread( 'Raw.txt' ) ;
data = textscan( content, '%f %f %f', 'HeaderLines', 22 ) ;
data = horzcat( data{:} ) ;
% - Now we process the buffer with REGEXP.
tokens = regexp( content, '(\w+)=(\S+)', 'tokens' ) ;
for tId = 1 : numel( tokens )
parameters.(tokens{tId}{1}) = str2double( tokens{tId}{2} ) ;
end
With that you get:
>> data
data =
1.0e+04 *
0.0000 -0.8247 -0.9921
0.0000 -0.7204 -1.0678
0.0000 -0.8800 -1.2426
0.0000 -0.7581 -1.0489
0.0000 -0.7281 -1.1200
0.0001 -0.6932 -1.0733
0.0001 -0.6615 -0.9821
0.0001 -0.7036 -1.0141
0.0001 -0.6607 -1.1401
0.0001 -0.5457 -0.9972
0.0001 -0.6714 -0.9440
0.0001 -0.9144 -1.0676
>> parameters
parameters =
normal: 6.1000
dow: 1
Num: 209
ionconc: 1
Desnoise: 100
Time: 0.0080
hotmol: 0
dex: 1
elay: 11250
Des: 16
Max: 1500
Offset: 0
Mode: 1
Note that you can use IMPORTDATA, but you have to specifiy the delimiter (a tab in your case) and the number of header lines:
conent = importdata( 'Raw.txt', '\t', 22 ) ;
>> content
content =
data: [12x3 double]
textdata: {22x3 cell}
colheaders: {'X' 'Wide' 'Resolution'}
Hope it helps!
Thanks for the response. The problem is my header lines are not fixed. They keep on changing. How to make it automated.
Can you provide a few files with different headers?
If you always had the 'Resolution' column header though, you could do something like:
% - Read file content.
content = fileread( 'Raw.txt' ) ;
% - Split on 'Resolution' column header.
content = strsplit( content, 'Resolution' ) ;
% - Parse array.
data = textscan( content{2}, '%f %f %f' ) ;
data = horzcat( data{:} ) ;
% - Parse parameters.
tokens = regexp( content{1}, '(\w+)=(\S+)', 'tokens' ) ;
for tId = 1 : numel( tokens )
parameters.(tokens{tId}{1}) = str2double( tokens{tId}{2} ) ;
end
Ok, the code in my comment above (with the split) should work. I almost never use IMPORTDATA to be honest, because I don't know what it does internally (see note *) and I never know whether it will work later if my format evolves a little. So I always develop parsers specifically for what I need to do, and I implement some flexibility if/when needed.
Note *: you can see how IMPORTDATA was implemented by typing
open importdata
in the command window. But again, you can reverse engineer this version to understand a bit better, but it is difficult to know how it will evolve in the future.
Thanks. I got it. How can I specify the numbers in parameters as input in my next line of the program.
The file number? You can build a string using SPRINTF, for example
for fileId = 1 : 10
filename = sprintf( 'Raw%d.txt', fileId )
content = fileread( filename ) ;
...
end
But you can also use DIR to get e.g. all text files, whatever their name:
D = dir( '*.txt' ) ;
for fileId = 1 : length( D )
filename = D(fileId).name ;
content = fileread( filename ) ;
...
end
This would catch Raw.txt for example, which has no number.
I just re-read your comment and realized that I misunderstood. The variable parameters is a struct, a variable with fields:
>> class( parameters )
ans =
struct
Its fields can be dot-indexed. If you want to address/index the field elay for example, you do it this way:
>> parameters.elay
ans =
11250
This is a numeric field of type/class double:
>> class( parameters.elay )
ans =
double
so you can compute with it:
>> parameters.elay / 10
ans =
1125
更多回答(0 个)
类别
在 帮助中心 和 File Exchange 中查找有关 Text Files 的更多信息
另请参阅
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!选择网站
选择网站以获取翻译的可用内容,以及查看当地活动和优惠。根据您的位置,我们建议您选择:。
您也可以从以下列表中选择网站:
如何获得最佳网站性能
选择中国网站(中文或英文)以获得最佳网站性能。其他 MathWorks 国家/地区网站并未针对您所在位置的访问进行优化。
美洲
- América Latina (Español)
- Canada (English)
- United States (English)
欧洲
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)
