Speeding up text reading

Question

Luis Eduardo Cofré Lizama 2018-11-28

0
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/432657-speeding-up-text-reading

关闭： MATLAB Answer Bot 2021-8-20

I am currently reading a "large" .txt file fro which I used the import tool in Matlab and then copied the generated code into my script, however, I think that there is a section of code that slows down the process quite significantly (below). I dont know what every single bit means so I was wondering how can I get rifd off whatever is slowing the reading down whithout affecting the output, perhaps there are unncesary processes/steps?

thanks a lot in advance!

formatSpec = '%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%[^\n\r]';
                            fileID = fopen(filename,'r','n','UTF-8');
                            fseek(fileID, 3, 'bof');
                            textscan(fileID, '%[^\n\r]', startRow (1,b), 'ReturnOnError', false);
                            dataArray = textscan(fileID, formatSpec, endRow(1,b)-startRow(1,b), 'Delimiter', delimiter, 'ReturnOnError', false);
                            
                            raw = repmat({''},length(dataArray{1}),length(dataArray)-1);
                            for col=1:length(dataArray)-1
                                raw(1:length(dataArray{col}),col) = dataArray{col};
                            end
                            
                            numericData = NaN(size(dataArray{1},1),size(dataArray,2));
                            
                            for col=[1:419]
                                rawData = dataArray{col};
                                for row=1:size(rawData, 1)
                                    regexstr = '(?<prefix>.*?)(?<numbers>([-]*(\d+[\,]*)+[\.]{0,1}\d*[eEdD]{0,1}[-+]*\d*[i]{0,1})|([-]*(\d+[\,]*)*[\.]{1,1}\d+[eEdD]{0,1}[-+]*\d*[i]{0,1}))(?<suffix>.*)';
                                    try
                                        result = regexp(rawData{row}, regexstr, 'names');
                                        numbers = result.numbers;
                                        invalidThousandsSeparator = false;
                                        if any(numbers==',')
                                            thousandsRegExp = '^\d+?(\,\d{3})*\.{0,1}\d*$';
                                            if isempty(regexp(thousandsRegExp, ',', 'once'))
                                                numbers = NaN;
                                                invalidThousandsSeparator = true;
                                            end
                                        end
                                        if ~invalidThousandsSeparator
                                            numbers = textscan(strrep(numbers, ',', ''), '%f');
                                            numericData(row, col) = numbers{1};
                                            raw{row, col} = numbers{1};
                                        end
                                    catch me
                                    end
                                end
                            end
                            
                            R = cellfun(@(x) ~isnumeric(x) && ~islogical(x),raw);
                            raw(R) = {NaN};

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

KSSV 2018-11-29

Read aboyt textscan. We can help on knowing your file format.

此问题已关闭。

Speeding up text reading

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

回答（0 个）

另请参阅

标签

产品

Community Treasure Hunt

Speeding up text reading

1 个评论 显示 -1更早的评论隐藏 -1更早的评论

回答（0 个）

另请参阅

标签

产品

Community Treasure Hunt

1 个评论
显示 -1更早的评论隐藏 -1更早的评论