URLREAD2 - User Agent and Cookies

6 次查看(过去 30 天)
Dan
Dan 2016-1-26
回答: Dan 2018-9-19
I'm at a loss at how to get this sample code working, and I was hoping if someone is able to review and assess my assumptions as to what mat be wrong.
Problem: I would like to use Matlab to access a webpage that is protected by a login screen. I am able to use wget and it works fine, however as we know, wget does not load ajax/javascript etc imbedded within the page. Therefore, I have turned to using urlread2 function available from the Matlab exchange. Hereafter, all examples are based on this function.
Example: I am trying to login to a financial website, however upon testing with other sites I get the same error. Therefore, for my example I am going to use fitbit.com. To mimimic the behaviour of a browser, I pass the following combined headers into urlread2 (I have split the code to make it easier to see what I'm doing):
value = 'https://www.fitbit.com';
header = http_createHeader('Host',value);
value = 'keep-alive';
header2 = http_createHeader('Connection',value);
value = '278';
header3 = http_createHeader('Content-Length',value);
value = 'max-age=0';
header4 = http_createHeader('Cache-Control',value);
value = 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8';
header5 = http_createHeader('Accept',value);
value = 'https://www.fitbit.com';
header6 = http_createHeader('Origin',value);
value = 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.106 Safari/537.36';
header7 = http_createHeader('User-Agent',value);
value = 'application/x-www-form-urlencoded';
header8 = http_createHeader('Content-Type',value);
value = 'https://www.fitbit.com/login';
header9 = http_createHeader('Referer',value);
value = 'gzip, deflate';
header10 = http_createHeader('Accept-Encoding',value);
value = 'en-US,en;q=0.8';
header11 = http_createHeader('Accept-Language',value);
%Generate a combined header as required by urlread2
combined_header = [header header2 header3 header4 header5 header6 header7 header8 header9 header10 header11];
With the header information defined, I generate the query string required (this is for the post operation):
queryString = 'email=myemail&password=mypassword&login=Log+In';
Finally, bring it all together for the urlread2 function:
[output,extras] = urlread2('https://www.fitbit.com/login','post',queryString,combined_header);
The following response is embedded within the HTML:
'The owner of this website (www.fitbit.com) has banned your access based on your browser''s signature (2659bb18cf10354e-ua21).'
Possible problem 1:
It may well be that I'm passing in the header incorrectly, however when I mimic the headers via FireFox the page works correctly. Any advice on this would be greatly appreciated.
Possible problem 2:
I think the problem may be down to cookies, with the urlread2 (nor any other function in Matlab) supporting cookies - if this is the case does anyone have any suggestions on how to maybe tackle this?
Thanks,
Dan
  2 个评论
Yingyun Ai
Yingyun Ai 2017-6-6
Can I ask if you solve the question after? thank you. is there any possible way to do urlread2 with basic authentication?
Jyotsana Walia
Jyotsana Walia 2018-9-17
Hi, were you able to figure out passing basic auth to urlread2?

请先登录,再进行评论。

回答(1 个)

Dan
Dan 2018-9-19
Sadly I did not solve this problem. In the end I used a custom python script.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by