How to avoid or mitigate url changing special character to hexadecimal when interpreted by web() or webread() functions?

29 次查看(过去 30 天)
Attempting to use web or webread functions to access a url which features multiple pages that use "#resultPage=[PAGE NUMBER]" to denote which page the user is on. When using either of the web-accessing commands, if a # character is included, all = characters following the # become %3D. Checking with the ASCII value for the = character in hexadecimal, = is represented as 3D. So, it seems that when the url is passed to web or webread, all special characters following it become their hexadecimal values. This leads to the end of the url becoming "#resultPage%3D1" which does not open the desired page.
Testing with other special characters ($, %, ^, &) results in the same issue, they all are converted to a % sign followed by their hexadecimal values. I looked on many pages concerning web and webread, looked at special characters for MatLab, and through forums and bug reports and was unable to find anything about this particular issue. Attempted to use both default browser and the MatLab built in browser, but both have the same issue.
I assume this is an intended interaction, as # is a special special character, and the rest of the special characters are converted to %hexadecimal because websites are supposed to still work with this. However, the site I'm trying to access doesn't seem to like that.
I am able to replace the # in the original link with a %23 (the hexadecimal code for #), and on Chrome the website opens properly, however when fed through either web or webread it does not with either the default browser or built in.
Would greatly appreciate whatever advice could be given towards getting this to work as intended. For full transparency, I'm trying to scrape CarGurus.com for car listings so I can save them in an Excel sheet. I've attached my code for attempting to do this, as well as the link I'm attempting to reach below. Thank you!
(To use the function, call it with the link provided as the only input as a character array)

回答(2 个)

Fabian Schuette
Fabian Schuette 2023-7-17
The example above results with Matlab R2023a and Firefox into this:
https://www.google.com/search?q=matlab#search_results_mode%3Dinline
The symbol = was replace into %3D.
How can I avoid this?

Sarthak
Sarthak 2023-3-24
Hi James,
I tried to use the web function to access url’s with special characters such as ‘#’ and ‘=’ however the function is behaving as intended. It would be great if you could attach some screenshots of what exactly is the error and what you are trying to achieve.
url='https://www.google.com/search?q=matlab#search_results_mode=inline';
web(url)
You may also try to encode your url’s before passing them to the web and webread functions and see if the issue still persists.
  2 个评论
Nathan Kalish
Nathan Kalish 2024-6-20
There is an internal matlab function being called inside of the web function which is mangling the urls.
myUrl='https://www.google.com/search?q=matlab#search_results_mode=inline';
matlab.internal.web.resolveLocation(myUrl).EncodedURI
ans = "https://www.google.com/search?q=matlab#search_results_mode%3Dinline"
Result:
ans = "https://www.google.com/search?q=matlab#search_results_mode%3Dinline"
Is there a way to avoid this behavior being called within web(url)? I don't see a way to avoid it without calling several internal matlab methods (which I assume will change in the future).

请先登录,再进行评论。

类别

Help CenterFile Exchange 中查找有关 MATLAB 的更多信息

产品


版本

R2022a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by