Reading data into matlab

2 次查看(过去 30 天)
Baba
Baba 2011-10-31
Hi, I have a text file with space separated numbers that I need to import into Matlab to do some processing on. Can not use the "load" command to import the whole file because it's way too big (5Gb). Text file looks like this:
1.2 4.2 5.2 5.33 6.45 7.64 3.45 7.34 ........
2.34 5.23 .235 .2343 2.34 3.4 3.42........
and so on with
What I'd like to do is be able to read in and Store first 10 values of each row into a column vector. Then the next 10 values of each row and o on...
to have something like:
X=[row1 (1 thru 10); row2 (1 thru 10);...]
or more generally,
y=[row1 (start position thru end position;.....]
Any help appreciated,
Thank you!

采纳的回答

Walter Roberson
Walter Roberson 2011-10-31
I'm not so sure this will make you any happier, but...
To read in columns P through Q (inclusive) of file XYZ.TXT, ignoring H lines of headers:
fid = fopen('XYZ.TXT','rt');
Then for each combination of columns:
fseek(fid, 0, -1); %rewind
result = textscan( [repmat('%*f',1,P-1) repmat('%f',1,Q-P+1) '%*[^\n]'], 'HeaderLines', H, 'CollectOutput', 1);
cols.(sprintf('C%d_%d',P,Q)) = result{1};
clear result
When you are done reading as much as you can hold or as you want to deal with:
fclose(fid);
Feel free to use something other than a structure to hold the values.. keeping in mind that you have not specified that you will be using the same number of values each time so a plain numeric array might not work.
There is a more elegant way to skip leading columns, which I know about 3 days ago, but I'm having a heck of a time digging it up at the moment.
  3 个评论
Baba
Baba 2011-10-31
Walter, could you annotate your code with a little bit of explanation?
Walter Roberson
Walter Roberson 2011-10-31
%*f format means to read a floating point number and discard it. We repeat this read-discard enough times to read through to the column before the first one we are interested in.
%f format means to read a floating point number and save it. We repeat this read-save enough times to read from columns P to Q inclusive, which is Q-P+1 times.
%*[^\n] format means to find a sequence of characters that can match any character (including space) _except_ for \n which means newline in this context -- i.e., read to end of line. The * part means to discard it. Overall this means that we read whatever is left over after column Q on the line and discard it.
CollectOutput means to put all of the %f values read (columns P through Q) in to a single numeric array.
testscan() always wraps its output in a cell array even if only one item is output, so the result{1} extracts the numeric array.
sprintf('C%d_%d',P,Q) constructs strings like C7_15 intended to symbolize column 7 through 15.
cols.() the string above is dynamic field name referencing of a structure. So the assignment would be to (e.g.)
cols.C7_15

请先登录,再进行评论。

更多回答(0 个)

类别

Help CenterFile Exchange 中查找有关 Large Files and Big Data 的更多信息

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by