How to split a huge string array efficently
3 次查看(过去 30 天)
显示 更早的评论
Hi everyone,
I'm trying to split a huge string (~8.5mb, ~11.500 rows x ~400 columns) efficiently, but I cannot do that without a quiet slow "for" loop I cannot remove.
The number of colums may change from a file to another one so it's not possible for me to determin initially a unique format of the file and then import it according to it.
%% getting data from .txt => really fast
tic
disp('importing file');
a = string(textread([pwd '\test.txt'],'%s','headerlines',1)); %#ok<*DTXTRD>
toc
%% splitting each row in colums by delimiter ";" => slow
tic
disp('splitting each row by ";"');
b = strings(length(a),length(strsplit(a(1),';')));
for k=1:length(a)
b(k,:) = strsplit(a(k),';');
end
toc
%% date(str) to datenum => really fast
tic
disp('conv date to datenum');
dat1 = datenum(b(:,1),'yyyy-mm-dd');
toc
%% str to logical => really fast
tic
disp('converting data to logical array')
dat2 = logical(strcmp(b(:,2:end),'1')); %super fast
%dat2 = str2double(b(:,2:end)); %very slow
toc
% disp('converting data to logical array - 2'); %super fast as well
% tic
% dat2 = zeros(size(b));
% dat2(strcmp(b(:,2:end),'1')) = 1;
% toc
Thanks everyone! :)
Source file sample
采纳的回答
Walter Roberson
2020-7-24
Why not use readtable() ?
I would also point out that textscan() can process character vectors in which the lines are separated by newlines.
Note: in your release if you use detectImportOptions then it would probably automatically figure out that the first column is a date, and would convert it to datetime format.
It will probably also figure out that the other columns are numeric, in which case strcmp() would not be needed, just
date2 = logical(a{:,2:end});
You might need to use 'HeaderLines', 1, 'ReadVariableNames', false
0 个评论
更多回答(0 个)
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 Startup and Shutdown 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!