Grouping and Reading Files Sharing Unique Strings

Question

Connie Chang-Chien 2021-3-16

0
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/774822-grouping-and-reading-files-sharing-unique-strings

回答： Zinea 2024-2-23

I'm currently trying to simplify and reduce the processing time needed to read through files in a folder.

One of the problems is that I need to group certain files together based on sharing the same numeric string, then pull variables from these related files to create a row in a table and repeat this for all unique file numbers in the folder. However, the number of related matches might range from just one unique file up to 3 related files, so I can't work through the folder in a step wise manner.

What would be some corrections or alternative structure to decrease the processing time?

Here is my code below, but even this without the main part of the code is taking a long time:

reports = dir(fullfile(reports_folder, '*.doc'));
k = 1;
while k <= length(reports)  
    case_regex = '\d+\-\d+';
    baseFileName = reports(k).name;
    
    base_no =  regexp(filename, case_regex, 'match'); %ID Case
    possibleMatchFile = reports(k+1).name; %put into temporary list if they match, through which will always be in alphabetical order
    Match_1 = regexp(filename, case_regex, 'match'); %ID Case
    possibleMatchFile2 = reports(K+2).name;
    Match_2 = regexp(filename, case_regex, 'match'); %ID Case
    
    list_same_case = [baseFileName];
    if isequal(Match_1 , base_no ) 
        list_same_case(end+1) = possibleMatchFile;
    end
    if isequal(Match_2 , base_no)
        list_same_case(end+1) = possibleMatchFile2; %At this point, it should have added all the names of the additional files with the same case number, hopefully it's only the case_number name, not the entire path
    end
    
    filename = fullfile(reports_folder, baseFileName);
    %Read and grab variables from files of interest, store
    k = k + length(list_same_case)
end

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

请先登录，再回答此问题。

Answer 1

Zinea 2024-2-23

0
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/774822-grouping-and-reading-files-sharing-unique-strings#answer_1415018

在 MATLAB Online 中打开

Hi Connie Chang-Chien,

You can use a map data structure. This greatly reduces processing time as it avoids the need to compare each file with every other file as is explained below:

One-time scan: The map is populated by scanning through the list of files only once. Each file’s case number is extracted and used as a key in the map. If the case number has already been encountered, the file is appended to the list associated with that case number; otherwise, a new list is created.
Constant-time Access: Maps provide near-constant access for inserting and retrieving values based on keys. This is much faster than searching through a list or array to find if a case number is already present.

You can refer below to the given code using map:

reports = dir(fullfile(reports_folder, '*.doc')); 
num_reports = length(reports); 
case_regex = '\d+-\d+'; 
% Use a map to group files by their numeric string 
file_map = containers.Map('KeyType', 'char', 'ValueType', 'any'); 
for i = 1:num_reports 
    baseFileName = reports(i).name; 
    case_number = regexp(baseFileName, case_regex, 'match', 'once'); % Extract case number 
    
    % Check if the case number is already in the map 
    if isKey(file_map, case_number) 
        file_map(case_number){end+1} = baseFileName; 
    else 
        file_map(case_number) = {baseFileName}; 
    end 
end 
% Now iterate over each unique case number 
for case_number = keys(file_map) 
    list_same_case = file_map(case_number{1}); 
end 

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

Grouping and Reading Files Sharing Unique Strings

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

回答（1 个）

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

另请参阅

类别

标签

Community Treasure Hunt

Grouping and Reading Files Sharing Unique Strings

0 个评论 显示 -2更早的评论隐藏 -2更早的评论

回答（1 个）

0 个评论 显示 -2更早的评论隐藏 -2更早的评论

另请参阅

类别

标签

Community Treasure Hunt

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

0 个评论
显示 -2更早的评论隐藏 -2更早的评论