Select rows from txt file

Question

0 个投票

Hi,

Could someone help on how can I choose specific rows from the below sample (the original txt file is actually huge):

Stock Date Time Price Volume Stock Category > ETE 04/01/2010 10145959 18.34 500 Big Cap > ETE 04/01/2010 10150000 18.34 70 Big Cap > ETE 04/01/2010 10170000 18.34 430 Big Cap > ABC 04/01/2010 10190000 18.34 200 Big Cap > YYY 04/01/2010 10200000 18.34 100 Big Cap > ETE 04/01/2010 10250000 18.34 40 Big Cap > ETE 04/01/2010 10295959 18.34 215 Big Cap > ETE 04/01/2010 10300000 18.34 500 Big Cap > ETE 04/01/2010 10320000 18.34 500 Big Cap

For instance can I keep only rows for stock 'ABC' (column 1)?

Thanks in advance,

Panos

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

Andrew Newell 2011-3-21

Is each string between >'s supposed to be a row? If so, please format it so people know what you're talking about.

请先登录，再进行评论。

请先登录，再回答此问题。

Follow Question

Answer 1

Matt Tearle 2011-3-22

在 MATLAB Online 中打开

0 个投票

How are you going to store the data once you get it? Do you just want the text? Or do you actually want to read in data? If the latter, you'll need to store the data in some kind of flexible container, like a cell array. Unfortunately, this will increase your memory requirements (which could be a problem, given that you said the file is huge). Here are some options you could try. Try them on the small example to see the various formats for the result. In particular, keep an eye on the bytes used, if memory is an issue.

Option 1:

fid = fopen('stocks.txt');
data = textscan(fid,'%s%s%f%f%f%[^\n]','delimiter',' ','headerlines',1);
idx = strcmp('ETE',data{1});
f = @(x) x(idx);
ETEdata = cellfun(f,data,'uniformoutput',false)
whos ETEdata
fclose(fid);

Option 2:

fid = fopen('stocks.txt');
hdr = textscan(fid,'%s%s%s%s%s%[^\n]',1,'delimiter',' ');
ETEdata = {};
while ~feof(fid)
    thisdata = textscan(fid,'%s%s%f%f%f%[^\n]',1,'delimiter',' ');
    if strcmp(thisdata{1},'ETE')
        ETEdata = [ETEdata;{thisdata{1}{1},thisdata{2}{1},...
            thisdata{3:5},thisdata{6}{1}}];
    end
end
fclose(fid);
ETEdata
whos ETEdata

Option 3 (requires Statistics Toolbox):

fid = fopen('stocks.txt');
hdr = textscan(fid,'%s%s%s%s%s%[^\n]',1,'delimiter',' ');
fclose(fid);
hdr = [hdr{:}];
hdr = regexprep(hdr,'\W','');
data = dataset('file','stocks.txt','format','%s%s%f%f%f%[^\n]',...
    'delimiter',' ','headerlines',1,'readvarnames',false);
data.Properties.VarNames = hdr
data.Stock = nominal(data.Stock);
data.StockCategory = nominal(data.StockCategory);
ETEdata = data(data.Stock == 'ETE',:)
whos ETEdata

7 个评论
显示 5更早的评论隐藏 5更早的评论

Matt Tearle 2011-3-22

There's a space between those single quotes, to tell textscan that the delimiter is whitespace.

The errors in your first attempt, as I assume you figured out, were due to missing quotes around the property names (delimiter and headerlines). Walter is correct about the error in the second attempt. You took out the delimiter property name, but left the empty quotes.

Pap 2011-3-22

Many many thanks guys

请先登录，再进行评论。

Answer 2

Walter Roberson 2011-3-21

0 个投票

In such a case I would probably use perl. The same functionality can be written in Matlab directly but the I/O would be slower than for perl.

Anyhow, fopen() your text file, fgetl() on it to read the header, then start a loop. fgetl() on the file, compare the first N+1 characters to your target stock name followed by a blank; only save the line if you got a match; continue loop.

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

Select rows from txt file

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

采纳的回答

7 个评论
显示 5更早的评论隐藏 5更早的评论

更多回答（1 个）

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

类别

产品

标签

Community Treasure Hunt

Select rows from txt file

1 个评论 显示 -1更早的评论 隐藏 -1更早的评论

采纳的回答

7 个评论 显示 5更早的评论 隐藏 5更早的评论

更多回答（1 个）

0 个评论 显示 -2更早的评论 隐藏 -2更早的评论

类别

产品

标签

另请参阅

Community Treasure Hunt

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

7 个评论
显示 5更早的评论隐藏 5更早的评论

0 个评论
显示 -2更早的评论隐藏 -2更早的评论