Sort Excel files by file content?

2 次查看(过去 30 天)
Hi all, my new challenge with matlab involves filtering files with very inconsistent names. For example,
s =
'HI_B2_TTT9_D452_07052016.xlsx'
'HI_H2G_TTT7_D259_070516.xlsx'
'HI_B2C_TTT9_D1482_070516.xlsx'
'HI_A1C_468_070516_TTT4.xlsx'
'HI__TTT8_862_07052016_G1C.xlsx'
'HI_KA6_TTT4_148_07052016.xlsx'
'8C_HI_279_Potato_07052016.xlsx'
'HI_8C_279_Bacon_TTT52016.xlsx'
The files that I want are the first six files, which have different styles of naming even though they are the same type of files (TTT). While the last two files are undesired and need to be filter out indicated by keywords such as "Potato" and "Bacon".
My goal is to Extract files that contains the keywords "TTT" while eliminate files that have keywords "Potato" and "Bacon", this is not ideal since there are in fact hundreds of these files in my folders outside of this simple example that are constantly updating and I will need to look through them all for other potential unwanted keywords such as "Sour Cream", etc.
My ideal goal will be to extract those TTT files by its content, since all the TTT excel files have a sheet named "cooking is fun" inside while all the other ones do not. Is this feasible and is there a best way to do so?
Thank you so much for reading my concern and any inputs will be greatly appreciated!
  5 个评论
chlor thanks
chlor thanks 2016-7-6
I am thinking that maybe I can do:
Step 1: Extract all "TTT" files
Step 2: Further eliminate any files with keyword "Bacon"

请先登录,再进行评论。

采纳的回答

Guillaume
Guillaume 2016-7-6
编辑:Guillaume 2016-7-13
You could certainly inspect the content of the excel files to see if there's the worksheet you want. However, since it involves opening and closing every file, it's not going to be particularly fast. It's up to you whether that's important or not:
function hassheet = CheckExcelFilesForWorksheet(folder, filelist, sheetname)
%HASSHEET Check whether or not the given excel files have a sheet with the given name
%folder: folder where the excel files are located (1D char array / string)
%filelist: names of excel files (cell array of 1D char arrays / string array)
%sheetname: name of sheet to find in excel files (1D char array / string)
%hassheet: array the same size as filelist, indicating whether or not the excel file has a sheet with sheetname (logical)
hassheet = false(size(filelist)); %output
excel = actxserver('Excel.Application'); %start microsoft excel
cleanupobj = onCleanup(@() excel.Quit); %close excel when function returns or error occurs
for fileidx = 1:numel(filelist)
%open without updating link and as read only. read only ensure the file can be opened even if it's already in use:
workbook = excel.Workbooks.Open(fullfile(folder, filelist{fileidx}), false, true);
%get the list of worksheet name by iterating over the Sheets collection:
sheetnames = arrayfun(@(i) workbook.Sheets.Item(i).Name, 1:workbook.Sheets.Count, 'UniformOutput', false);
workbook.Close(false);
if ismember(sheetname, sheetnames)
hassheet(fileidx) = true;
end
end
end
edited 13/07/2016: fixed bugs: forgot to close the workbooks, typos
  6 个评论
chlor thanks
chlor thanks 2016-7-7
编辑:chlor thanks 2016-7-7
Ok. So I saved the function on its own, it now gives me the error...
Undefined function or variable 'numfiles'.
Error in CheckExcelFilesForWorksheet (line 10)
for fileidx = 1:numfiles
I am not sure why this line is invalid since I just checked, "numfiles" is exactly the same as if I use "numel(AllExcel)", where AllExcel is a pre-existing cell array of the names of all the available excel files that I have.
Guillaume
Guillaume 2016-7-7
This is a bug of your own. There's no numfiles in my original code. For reference, my original code for that line and the following was:
for fileidx = 1:numel(filelist)
%open without updating link and as read only. read only ensure the file can be opened even if it's already in use:
workbook = excel.Workbooks.Open(fullfile(folder, filelist{fileidx}), false, true);

请先登录,再进行评论。

更多回答(1 个)

Azzi Abdelmalek
Azzi Abdelmalek 2016-7-6
编辑:Azzi Abdelmalek 2016-7-6
Edited
a=regexp(s,'.+TTT.+','match','once')
b=regexprep(a,'\S+Bacon\S+','')
out=s(~cellfun(@isempty,b))
  3 个评论
Guillaume
Guillaume 2016-7-6
Certainly not! More likely, you'd get a syntax error. I'd recommend you learn the regular expression language (note that this is not a language specific to matlab).
If you want to replace Bacon or Potato, this regex would work:
regexprep(a, '\S+(?:Bacon|Potato)\S+', '')
However, this has nothing to do with the original question: "Sort Excel files by file content?"
chlor thanks
chlor thanks 2016-7-6
After you mention that it will require the opening and closing of each file, I realize that it may take forever for all my files to be filtered every time I run the codes. Thus I am considering doing a quick regexp before I make the attempt of opening any files.
Thank you very much for all your help!

请先登录,再进行评论。

类别

Help CenterFile Exchange 中查找有关 Data Import from MATLAB 的更多信息

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by