How to split a huge string array efficently

11 次查看(过去 30 天)
Hi everyone,
I'm trying to split a huge string (~8.5mb, ~11.500 rows x ~400 columns) efficiently, but I cannot do that without a quiet slow "for" loop I cannot remove.
The number of colums may change from a file to another one so it's not possible for me to determin initially a unique format of the file and then import it according to it.
%% getting data from .txt => really fast
tic
disp('importing file');
a = string(textread([pwd '\test.txt'],'%s','headerlines',1)); %#ok<*DTXTRD>
toc
%% splitting each row in colums by delimiter ";" => slow
tic
disp('splitting each row by ";"');
b = strings(length(a),length(strsplit(a(1),';')));
for k=1:length(a)
b(k,:) = strsplit(a(k),';');
end
toc
%% date(str) to datenum => really fast
tic
disp('conv date to datenum');
dat1 = datenum(b(:,1),'yyyy-mm-dd');
toc
%% str to logical => really fast
tic
disp('converting data to logical array')
dat2 = logical(strcmp(b(:,2:end),'1')); %super fast
%dat2 = str2double(b(:,2:end)); %very slow
toc
% disp('converting data to logical array - 2'); %super fast as well
% tic
% dat2 = zeros(size(b));
% dat2(strcmp(b(:,2:end),'1')) = 1;
% toc
Thanks everyone! :)
Source file sample
  3 个评论
endystrike
endystrike 2020-7-24
Thanks Walter, I fixed following your advice! :)
tic
a = readtable([pwd '\test.txt'],'delimiter',';');
dat1 = datenum(string(a{:,1}),'yyyy-mm-dd');
dat2 = logical(strcmp(string(a{:,2:end}),'1'));
toc
endystrike
endystrike 2020-7-24
If you want to put it as an answer, I'll accept it: you helped me a lot and I fixed the issue! :)

请先登录,再进行评论。

采纳的回答

Walter Roberson
Walter Roberson 2020-7-24
Why not use readtable() ?
I would also point out that textscan() can process character vectors in which the lines are separated by newlines.
Note: in your release if you use detectImportOptions then it would probably automatically figure out that the first column is a date, and would convert it to datetime format.
It will probably also figure out that the other columns are numeric, in which case strcmp() would not be needed, just
date2 = logical(a{:,2:end});
You might need to use 'HeaderLines', 1, 'ReadVariableNames', false

更多回答(0 个)

类别

Help CenterFile Exchange 中查找有关 Characters and Strings 的更多信息

产品


版本

R2019a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by