Download a file from a website?
24 次查看(过去 30 天)
显示 更早的评论
Ara
2022-8-6
Dear All,
I have a code to download data from a website. It downolad it but the folder is empty inside init. Do you know where the problem is?
采纳的回答
Walter Roberson
2022-8-6
编辑:Walter Roberson
2022-8-6
22 个评论
Ara
2022-8-7
Hi Walter, Thank you. Here is the code, though I cannot provide my user name and password. Do you know where the problem is?
-----------------------------
Clear all;clc;
url = 'http://cdaac-www.cosmic.ucar.edu/cdaac/cgi_bin/fileFormats.cgi?type=scnLv1';
buffer = urlread (url);
% pattern = '
%(2)
% - Add the code for reading data, update the function so it outputs them
username = 'xxx' ;
password = 'xxx' ;
% - Define path to wget.
% wgetExec = '"C:\Program Files\GnuWin32\bin\wget"' ;
wgetExec = '"C:\Program Files (x86)\GnuWin32\bin\wget"' ;
% % - Define base path for accessing data (may have to update it).
dataBase = 'http://cdaac-www.cosmic.ucar.edu/cdaac/rest/tarservice/data/cosmic2013/scnLv1/2013.';
% - Disable warning when overwriting folders.
warning( 'off', 'MATLAB:MKDIR:DirectoryExists' ) ;
for dayId = 01:227
% - Create directory and cd.
dirName = sprintf( '2013_0%02d', dayId ) ;
mkdir( dirName ) ;
cd( dirName ) ;
% - Build wget call string and make the call.
command = sprintf( '%s -nd -np -l 1 -r -w 2 --http-user=%s --http-passwd=%s %s%03d', wgetExec, username, password, dataBase, dayId ) ;
status = system( command, '-echo' ) ; % You may want to remove the echo when it works.
if status
warning( 'There was an issue with WGET on day %d.', dayId ) ;
end
% - Return to top directory.
cd( '..' ) ;
end
warning( 'on', 'MATLAB:MKDIR:DirectoryExists' ) ;
Walter Roberson
2022-8-7
Is there a reason you are calling an external wget instead of using webread()?
Ara
2022-8-8
I do not know if there is any specific reason for it. I did not write it by myself. Would you please tell me how to change it?
Walter Roberson
2022-8-8
You would use weboptions to construct the username and password information. You would construct a url from your dataBase and dayID . You would set the options to request binary data.
The output you get back would be a column vector of uint8. You would fopen() a file ending with .tar in the name, and fwrite() the data to the file. You would then use untar to extract the files.
You should probably check to be sure the stream of bytes was not empty... I suspect that you are either getting an error message about access or else there just isn't any data there.
Walter Roberson
2022-8-9
编辑:Walter Roberson
2022-8-9
url = 'http://cdaac-www.cosmic.ucar.edu/cdaac/cgi_bin/fileFormats.cgi?type=scnLv1';
username = 'xxx' ;
password = 'xxx' ;
options = weboptions('Username', username, 'Password', password, ...
'type', 'raw');
% % - Define base path for accessing data (may have to update it).
dataBase = 'http://cdaac-www.cosmic.ucar.edu/cdaac/rest/tarservice/data/cosmic2013/scnLv1/2013.';
origdir = pwd();
for dayId = 1:227
% - Create directory and cd.
dirName = sprintf( '2013_%03d', dayId ) ;
cd( origdir );
mkdir( dirName );
cd( dirName );
rawdata = webread( [dataBase dirName], options );
if isempty(rawdata)
fprintf('no data reading %s\n', dirName);
else
try
fid = fopen('rawdata.tar', 'w');
fwrite(fid, rawdata, 'uint8');
fclose(fid)
untar('rawdata.tar');
fprintf('success %s\n', dirName);
catch ME
printf('some failure on %s\n', dirName);
lasterror
end
end
cd( origdir );
end
Ara
2022-8-9
Thank you very much.
I got this error:
Error using weboptions
'type' is not a recognized parameter. For a list of valid name-value pair arguments,
see the documentation for weboptions.
Error in weboptions>parseInputs (line 638)
p.parse(args{:});
Error in weboptions (line 375)
inputs = parseInputs(options, varargin);
Error in main_dowloadFromWebsite (line 4)
options = weboptions('Username', username, 'Password', password, ...
Walter Roberson
2022-8-9
url = 'http://cdaac-www.cosmic.ucar.edu/cdaac/cgi_bin/fileFormats.cgi?type=scnLv1';
username = 'xxx' ;
password = 'xxx' ;
options = weboptions('Username', username, 'Password', password, ...
'ContentType', 'raw');
% % - Define base path for accessing data (may have to update it).
dataBase = 'http://cdaac-www.cosmic.ucar.edu/cdaac/rest/tarservice/data/cosmic2013/scnLv1/2013.';
origdir = pwd();
for dayId = 1:227
% - Create directory and cd.
dirName = sprintf( '2013_%03d', dayId ) ;
cd( origdir );
mkdir( dirName );
cd( dirName );
rawdata = webread( [dataBase dirName], options );
if isempty(rawdata)
fprintf('no data reading %s\n', dirName);
else
try
fid = fopen('rawdata.tar', 'w');
fwrite(fid, rawdata, 'uint8');
fclose(fid)
untar('rawdata.tar');
fprintf('success %s\n', dirName);
catch ME
printf('some failure on %s\n', dirName);
lasterror
end
end
cd( origdir );
end
Ara
2022-8-9
Thank you, Walter. I got an errr. Please see bellow:
Error using matlab.internal.webservices.HTTPConnector/copyContentToByteArray (line
396)
The server returned the status 301 with message "Moved Permanently" in response to
the request to URL
http://cdaac-www.cosmic.ucar.edu/cdaac/rest/tarservice/data/cosmic2013/scnLv1/2013.2013_001.
Error in readContentFromWebService (line 46)
byteArray = copyContentToByteArray(connection);
Error in webread (line 125)
[varargout{1:nargout}] = readContentFromWebService(connection, options);
Error in main_dowloadFromWebsite (line 16)
rawdata = webread( [dataBase dirName], options );
Walter Roberson
2022-8-10
You would have to log the HTTP session; the details would show the new URL to use.
There are also ways to do it using telnet, but it is a nuisance to get right, especially with the authentication step.
I do not have an account with them so I cannot trace it myself.
Ara
2022-8-11
编辑:Walter Roberson
2022-8-11
Dear Walter,
Here is the website after the user name and password. When I run it using this URL( https://cdaac-www.cosmic.ucar.edu/cdaac/tar/rest.html ) it shows an error. Please see below.
I can share my username and password only with you (I can send it using your contact that privided in your profile. But please keep it with yourself. Please let me know then I will send it to you.
Error using mkdir
Access is denied.
Error in main_dowloadFromWebsite (line 15)
mkdir( dirName );
Walter Roberson
2022-8-11
The following puts in error checks. It does not try to write into your current directory, as your earlier error messages show that you do not have write access to your current directory. Instead, it writes into a temporary directory that you should have write access to unless your system is misconfigured.
If you get "error reading from url" then the URL in dataBase is wrong, or you have an authentication error.
If you get "no error but no data available from" then reading from the URL did not error but no data was delivered.
If you get "got data but could not untar" then you received some data but it was not a valid tar file. Either the site sent an in-line message or else we did not properly figure out how to tell it how to download a file.
Any other warning message represents a problem on your side.
%url = 'http://cdaac-www.cosmic.ucar.edu/cdaac/cgi_bin/fileFormats.cgi?type=scnLv1';
username = 'xxx' ;
password = 'xxx' ;
options = weboptions('Username', username, 'Password', password, ...
'ContentType', 'raw');
% % - Define base path for accessing data (may have to update it).
dataBase = 'http://cdaac-www.cosmic.ucar.edu/cdaac/rest/tarservice/data/cosmic2013/scnLv1/2013.';
origdir = pwd();
td = tempdir;
outputdir = fullfile(td, 'cosmic2013');
if ~isdir(outputdir)
try
mkdir(outputdir)
catch ME
error('could not create output directory "%s"', outpudir);
end
end
fprintf('Extracting data into directory "%s"\n', outputdir);
for dayId = 1:227
% - Create directory and cd.
dirName = fullfile(outputdir, sprintf( '2013_%03d', dayId ) );
if ~isdir(dirName)
try
mkdir( dirName );
catch ME
warning('skipping day %d, could not create directory at "%s"', dayID, dirName);
continue;
end
end
cd( dirName );
thisurl = [dataBase dirName];
try
rawdata = webread( thisurl, options );
catch ME
warning('error reading from url "%s"', thisurl);
continue;
end
if isempty(rawdata)
warning('no error but no data available from "%s"', url);
continue
else
tarname = fullfile(dirName, 'rawdata.tar');
fid = fopen(tarname, 'w');
if fid < 0
warning('skipping day %d, got data but could not create tar at "%s"', dayId, tarname);
continue;
end
fwrite(fid, rawdata, 'uint8');
fclose(fid)
try
untar(tarname);
catch ME
warning('got data but could not untar, examine "%s"', tarname);
continue
end
fprintf('success for day %d\n', dayID);
end
end
cd(origdir)
Ara
2022-8-11
Dear Walter,
Thank you very much. It works but the folder is empty. Error in the URL.
The url is like this"https://cdaac-www.cosmic.ucar.edu/cdaac/tar/rest.html" after inserting username and password, it should select cosmic 2013then select file from the calendar and the folder is eaxctly came that is full of netcdf file to extract S4, time, etc. I would appriciate if you resolve the error. Which url do I need to use?
Ara
2022-8-11
Here is the warning error and does not stop Matlab.
Warning: error reading from url
"http://cdaac-www.cosmic.ucar.edu/cdaac/rest/tarservice/data/cosmic2013/scnLv1/2013.C:\Users\Aramesh\AppData\Local\Temp\cosmic2013\2013_006"
Walter Roberson
2022-8-11
%url = 'http://cdaac-www.cosmic.ucar.edu/cdaac/cgi_bin/fileFormats.cgi?type=scnLv1';
username = 'xxx' ;
password = 'xxx' ;
options = weboptions('Username', username, 'Password', password, ...
'ContentType', 'raw');
% % - Define base path for accessing data (may have to update it).
dataBase = 'http://cdaac-www.cosmic.ucar.edu/cdaac/rest/tarservice/data/cosmic2013/scnLv1/2013.';
origdir = pwd();
td = tempdir;
outputdir = fullfile(td, 'cosmic2013');
if ~isdir(outputdir)
try
mkdir(outputdir)
catch ME
error('could not create output directory "%s"', outpudir);
end
end
fprintf('Extracting data into directory "%s"\n', outputdir);
for dayId = 1:227
% - Create directory and cd.
dirName = sprintf('2013_%03d', dayId);
dayoutputdir = fullfile(outputdir, dirName);
if ~isdir(dayoutputdir)
try
mkdir( dayoutputdir );
catch ME
warning('skipping day %d, could not create directory at "%s"', dayID, dayoutputdir);
continue;
end
end
cd( dayoutputdir );
thisurl = [dataBase dirName];
try
rawdata = webread( thisurl, options );
catch ME
warning('error reading from url "%s"', thisurl);
continue;
end
if isempty(rawdata)
warning('no error but no data available from "%s"', url);
continue
else
tarname = fullfile(dayoutputdir, 'rawdata.tar');
fid = fopen(tarname, 'w');
if fid < 0
warning('skipping day %d, got data but could not create tar at "%s"', dayId, tarname);
continue;
end
fwrite(fid, rawdata, 'uint8');
fclose(fid)
try
untar(tarname);
catch ME
warning('got data but could not untar, examine "%s"', tarname);
continue
end
fprintf('success for day %d\n', dayID);
end
end
cd(origdir)
Ara
2022-8-11
Thank you, Walter. Sorry to disturb you alot. But, again got an error.
- Invalid expression. When calling a function or indexing a variable, use parentheses.
- Otherwise, check for mismatched delimiters.
Ara
2022-8-11
Thank you, Walter. Sorry to disturb you alot.
I corrected the errors. It download the folder name in C, but the folder is empty. Do you have any idea how to fix this?
Walter Roberson
2022-8-11
The redirect error turned out to be because you were making an http request but the site wants https requests.
The below code is tested.
Note: the below code leaves the .tar file in place; you might want to delete those.
Note: if you want to interrupt, control C repeatedly. I built in a test that it gives up after 5 errors, and that turns out to include errors caused by interrupting the download.
%url = 'http://cdaac-www.cosmic.ucar.edu/cdaac/cgi_bin/fileFormats.cgi?type=scnLv1';
username = 'xxx' ;
password = 'xxx' ;
options = weboptions('Username', username, 'Password', password, ...
'ContentType', 'raw');
% % - Define base path for accessing data (may have to update it).
dataBase = 'https://cdaac-www.cosmic.ucar.edu/cdaac/rest/tarservice/data/cosmic2013/scnLv1/2013.';
origdir = pwd();
td = tempdir;
outputdir = fullfile(td, 'cosmic2013');
if ~isfolder(outputdir)
try
mkdir(outputdir)
catch ME
error('could not create output directory "%s"', outpudir);
end
end
fprintf('Extracting data into directory "%s"\n', outputdir);
errorcount = 0;
maxerror = 5;
for dayId = 1:227
% - Create directory and cd.
dirName = sprintf('%03d', dayId);
dayoutputdir = fullfile(outputdir, dirName);
if ~isfolder(dayoutputdir)
try
mkdir( dayoutputdir );
catch ME
warning('skipping day %d, could not create directory at "%s"', dayID, dayoutputdir);
errorcount = errorcount + 1; if errorcount >= maxerror; error('too many errors, giving up'); end
continue;
end
end
thisurl = [dataBase dirName];
try
rawdata = webread( thisurl, options );
catch ME
warning('error reading from url "%s"', thisurl);
errorcount = errorcount + 1; if errorcount >= maxerror; error('too many errors, giving up'); end
continue;
end
if isempty(rawdata)
warning('no error but no data available from "%s"', url);
errorcount = errorcount + 1; if errorcount >= maxerror; error('too many errors, giving up'); end
continue
else
tarname = fullfile(dayoutputdir, 'rawdata.tar');
fid = fopen(tarname, 'w');
if fid < 0
warning('skipping day %d, got data but could not create tar at "%s"', dayId, tarname);
errorcount = errorcount + 1; if errorcount >= maxerror; error('too many errors, giving up'); end
continue;
end
fwrite(fid, rawdata, 'uint8');
fclose(fid);
try
cd( dayoutputdir );
untar(tarname);
cd( origdir);
catch ME
cd( origdir );
warning('got data but could not untar, examine "%s"', tarname);
errorcount = errorcount + 1; if errorcount >= maxerror; error('too many errors, giving up'); end
continue
end
fprintf('success for day %d\n', dayId);
end
end
cd(origdir)
fprintf('Files extracted to "%s"\n', outputdir);
Example output file:
/private/var/folders/jq/wx1hzy713dj_408tpm5fck040000gn/T/cosmic2013/001/cosmic2013/scnLv1/2013.001/scnLv1_C001.2013.001.11.58.0005.G04.03_2013.3520_nc
This code is creating the Cosmic2013/001" level and the untar is creating the cosmic2013/scnLv1/2013.001 level under that. The /private/var/folders/jq/wx1hzy713dj_408tpm5fck040000gn/T here is the temporary directory that resulted from tempdir()
I download into a directory relative to tempdir() because you do not seem to have write access to your current directory.
Walter Roberson
2022-8-11
I tested the code on my system (Mac). For example,
>> ls /private/var/folders/jq/wx1hzy713dj_408tpm5fck040000gn/T/cosmic2013/001/cosmic2013/scnLv1/2013.001/
scnLv1_C001.2013.001.00.00.0004.G08.03_2013.3520_nc scnLv1_C002.2013.001.02.02.0033.G14.01_2013.3520_nc scnLv1_C002.2013.001.18.53.0027.G16.01_2013.3520_nc scnLv1_C005.2013.001.08.12.0008.G10.01_2013.3520_nc
scnLv1_C001.2013.001.00.01.0001.G22.02_2013.3520_nc scnLv1_C002.2013.001.02.03.0001.G20.02_2013.3520_nc scnLv1_C002.2013.001.18.54.0001.G05.02_2013.3520_nc scnLv1_C005.2013.001.08.18.0001.G04.02_2013.3520_nc
scnLv1_C001.2013.001.00.02.0001.G31.02_2013.3520_nc scnLv1_C002.2013.001.02.04.0019.G02.01_2013.3520_nc scnLv1_C002.2013.001.18.54.0023.G19.01_2013.3520_nc scnLv1_C005.2013.001.08.18.0001.G28.03_2013.3520_nc
(and more)
I did not test it on Windows (I am not sure I have a functioning Windows MATLAB installed at the moment.)
更多回答(1 个)
Ara
2022-8-11
Dear Walter,
It works. For folder 1 completely downloaded all data. Thank you very much. Only problem is very slow and gets busy for ~25min for one file. Is there any way to improve it?
3 个评论
Ara
2022-8-11
Dear Walter,
Now, I have to read NETCDF files to extract S4, time, Longtitude, latitute. Do you know how I can do this? Do I need to open another question?
You did a great work for me and I would greatly appriciate your help.
Walter Roberson
2022-8-12
You are mostly being limited by the speed of your internet connection.
If you change the assignment
td = tempdir();
you could change the download directory to an SSD if you have one. That could potentially make the untar step faster.
Ara
2022-8-12
编辑:Ara
2022-8-12
Oh, I see! Yes the internet connection is not very good.
SSD means external memory? Would it be possible to download it all files in it instead of C? I do not know how to change it. Wouldyou please let me know how to change the path to the current folder or specifically in the external memory?
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 File Operations 的更多信息
标签
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!发生错误
由于页面发生更改,无法完成操作。请重新加载页面以查看其更新后的状态。
您也可以从以下列表中选择网站:
如何获得最佳网站性能
选择中国网站(中文或英文)以获得最佳网站性能。其他 MathWorks 国家/地区网站并未针对您所在位置的访问进行优化。
美洲
- América Latina (Español)
- Canada (English)
- United States (English)
欧洲
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom(English)
亚太
- Australia (English)
- India (English)
- New Zealand (English)
- 中国
- 日本Japanese (日本語)
- 한국Korean (한국어)