How to avoid or mitigate url changing special character to hexadecimal when interpreted by web() or webread() functions?
15 次查看(过去 30 天)
显示 更早的评论
Attempting to use web or webread functions to access a url which features multiple pages that use "#resultPage=[PAGE NUMBER]" to denote which page the user is on. When using either of the web-accessing commands, if a # character is included, all = characters following the # become %3D. Checking with the ASCII value for the = character in hexadecimal, = is represented as 3D. So, it seems that when the url is passed to web or webread, all special characters following it become their hexadecimal values. This leads to the end of the url becoming "#resultPage%3D1" which does not open the desired page.
Testing with other special characters ($, %, ^, &) results in the same issue, they all are converted to a % sign followed by their hexadecimal values. I looked on many pages concerning web and webread, looked at special characters for MatLab, and through forums and bug reports and was unable to find anything about this particular issue. Attempted to use both default browser and the MatLab built in browser, but both have the same issue.
I assume this is an intended interaction, as # is a special special character, and the rest of the special characters are converted to %hexadecimal because websites are supposed to still work with this. However, the site I'm trying to access doesn't seem to like that.
I am able to replace the # in the original link with a %23 (the hexadecimal code for #), and on Chrome the website opens properly, however when fed through either web or webread it does not with either the default browser or built in.
Would greatly appreciate whatever advice could be given towards getting this to work as intended. For full transparency, I'm trying to scrape CarGurus.com for car listings so I can save them in an Excel sheet. I've attached my code for attempting to do this, as well as the link I'm attempting to reach below. Thank you!
(To use the function, call it with the link provided as the only input as a character array)
0 个评论
回答(2 个)
Fabian Schuette
2023-7-17
The example above results with Matlab R2023a and Firefox into this:
https://www.google.com/search?q=matlab#search_results_mode%3Dinline
The symbol = was replace into %3D.
How can I avoid this?
0 个评论
Sarthak
2023-3-24
Hi James,
I tried to use the web function to access url’s with special characters such as ‘#’ and ‘=’ however the function is behaving as intended. It would be great if you could attach some screenshots of what exactly is the error and what you are trying to achieve.
url='https://www.google.com/search?q=matlab#search_results_mode=inline';
web(url)
You may also try to encode your url’s before passing them to the web and webread functions and see if the issue still persists.
2 个评论
Aditya
2024-2-8
I am using R2023b and have a URL with = and & and the web command is not handling this correctly.
Nathan Kalish
2024-6-20
There is an internal matlab function being called inside of the web function which is mangling the urls.
myUrl='https://www.google.com/search?q=matlab#search_results_mode=inline';
matlab.internal.web.resolveLocation(myUrl).EncodedURI
Result:
ans = "https://www.google.com/search?q=matlab#search_results_mode%3Dinline"
Is there a way to avoid this behavior being called within web(url)? I don't see a way to avoid it without calling several internal matlab methods (which I assume will change in the future).
另请参阅
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!