downloading files from a website with conditions on names of files
2 次查看(过去 30 天)
显示 更早的评论
This directory has files whose filename starts with a letter "A" and "B".
The filenames in the directory are like:
A_20080403.xml
A_20080403_1.xml
A_20080403_2.xml
A_20080404_1.xml
B_20080403_1.xml
That is
- Filenames are of the form "Capital letters"+"_"+"date"+"_"+"numbers".xml or "Capital letters"+"_"+"date".xml
- There are dates that do not have corresponding files
I would like to download all the files whose filenames start with a letter "A".
What has been tried:
(a) I was able to save a single file using "websave" command.
(b) I have asked the question at https://www.mathworks.com/matlabcentral/answers/457470-writing-loops-to-download-files-using-matlab-websave?s_tid=srchtitle and I got a code
for k = 20080401:20100101
filename = sprintf('A%d.xml', k);
url = ['https://www.somecompany.com/xml/' filename];
outfilename = websave(filename,url);
end
Problems with the above code: The above code does not work because
- This code assumes the filename of the form "Capital letters"+"date".xml and not the filenames that explained above
- This code returns the error for a date when there are no corresponding files and stops then
How shall one improve the above code?
0 个评论
回答(1 个)
Walter Roberson
2022-2-9
It would be more robust / faster if the site provided a way to list the available files, instead of having to do trial and error.
baseurl = "https://www.somecompany.com/xml/";
datelimits = datetime({'20080401', '20100101'}, 'InputFormat', 'yyyyMMdd');
subfile_limit = 5; %no more than _5 -- adjust as appropriate
subfile_modifier = ["", "_" + (1:subfile_limit)] + ".xml";
for Day = datelimits(1):datelimits(2)
daystr = string(Day);
for Sub = subfile_modifier
filename = "A_" + daystr + Sub;
url = baseurl + filename;
try
outfilename = websave(filename,url);
fprintf('fetched %s\n', filename);
catch
break; %skip remaining subfiles for this date upon first failure
end
end
end
2 个评论
Walter Roberson
2022-3-12
datelimits = datetime({'20080401', '20100101'}, 'InputFormat', 'yyyyMMdd', 'Format', 'yyyyMMdd');
另请参阅
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!