Retrieving data from a web page

76 次查看(过去 30 天)
Hi all, I am having troubles in retrieving some data from this website http://ricercaweb.unibocconi.it/criospatstatdb/rep06a.php. I am not very experienced in programming and it is the first time I try getting data from the web this way. I read quite a few posts but I don't get what an API is and the issues related to the webpage formats.
To put it simply, I'd like to tell MATLAB to go on the stated webpage, to put the word 'CompanyName'(ex.Novartis) into the search bar, retrieve the resulting table and save it in a usable format (csv for example).
I run this code but it doesn't work.
url = 'http://ricercaweb.unibocconi.it/criospatstatdb/rep06a.php';
options = weboptions('Keyname','Novartis','Keyvalue','text','ContentType','auto');
data = webread(url,options);
Could anyone help me, please? Thanks.

采纳的回答

Paolo
Paolo 2018-5-13
Hi Enrico, try the following:
url = 'http://ricercaweb.unibocconi.it/criospatstatdb/csv/rep06a_';
prompt = 'Enter company of interest:';
val = input(prompt,'s');
url = strcat(url,val,'.csv');
options = weboptions('RequestMethod','get','ArrayFormat','csv','ContentType','text');
try
data = webread(url,options);
disp('CSV formatted data:');
data
catch
disp('No information found.');
end
If by 'inserting the name of the company in the search bar' you meant just changing the URL, then the code above should do the trick. Simply specify the term of interest and it will retrieve the data for you.
On the other hand, if you were asking for a code which fills in the HTML form with a POST request, and retrieves the resulting data afterwards with a GET request, some changes would need to be made. Let me know.
  3 个评论
Paolo
Paolo 2018-5-16
编辑:Paolo 2018-5-16
Hi Enrico,
I am not entirely sure whether the web page you are using supports POST request. You can achieve the exact same functionalities without it anyway. When you submit the form, a GET request for the term of interest (e.g. Novartis) is sent to a certain address. We can use this address for implementing what you require. You can see this address yourself if you inspect the form (right click, inspect), and then check the network components of the window (if you are using Chrome).
You can copy the GET request from here, with the easiest option being the 'curl' command. The curl command will look something like this:
curl "http://ricercaweb.unibocconi.it/criospatstatdb/get_rep06a.php?q=novartis"
You can easily implement this in Matlab. There are two ways I can think of for doing it.
The first is as follows:
%Perform request at GET URL.
first_url = 'http://ricercaweb.unibocconi.it/criospatstatdb/get_rep06a.php?q=';
prompt = 'Enter company of interest:';
val = input(prompt,'s');
first_url = strcat(first_url,val);
options = weboptions('RequestMethod','auto','ContentType','text');
%Read response.
try
data = webread(first_url,options);
catch
disp('No information found.');
end
%Search for http link. Perform second GET request at .csv address.
if ~isempty(data)
expression = '(http://).*(.csv)';
[~,matches] = regexp(data,expression,'tokens','match');
second_url = matches{1};
options = weboptions('RequestMethod','GET','ArrayFormat','csv','ContentType','text');
try
data = webread(second_url,options);
disp('CSV formatted data:');
data
catch
disp('No information found.');
end
end
Alternatively, you can use system to execute 'curl' from Matlab. You must make sure that 'curl' is available in the path.
prompt = 'Enter company of interest:';
val = input(prompt,'s');
command = strcat('curl',{' '},'http://ricercaweb.unibocconi.it/criospatstatdb/get_rep06a.php?q=',val);
[~,cmdout] = system(command{1});
expression = '(http://).*(.csv)';
[~,matches] = regexp(cmdout,expression,'tokens','match');
url = matches{1};
command = strcat('curl',{' '}, url);
[~,cmdout] = system(command{1});
'cmdout' will contain the .csv response of the second GET request you are interested in. For multiple company names you would just have multiple curl requests. Hope this helps.
Enrico Scupola
Enrico Scupola 2018-5-20
Thanks a lot Paolo, that really helped me!

请先登录,再进行评论。

更多回答(0 个)

类别

Help CenterFile Exchange 中查找有关 Startup and Shutdown 的更多信息

产品


版本

R2018a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by