urlfilter function giving the same value when the actual value is changing continuously

1 次查看(过去 30 天)
I'm trying to build a program that will web scrape stock price data off of yahoo. I went into the html to find the specific target string that precedes the price, and the function works when I run it. However, its seems like the URL that MATLAB uses is either "old" or not updating because it will display the same stock price over and over again in my timed while loop. I clear my URL each loop and set it again, but that doesn’t seem to fix it.
Has anyone encounter this issue with web scrapping before?

回答(1 个)

Chetan
Chetan 2024-1-5
I understand that you're facing with scraping real-time stock price data from Yahoo Finance in MATLAB. It's quite common for repeated requests to the same URL to result in cached data being served, either by MATLAB's internal mechanisms or by the web server. Let's address this issue with a couple of steps to ensure you're getting live updates:
1. Disable Cache with `weboptions
  • You can specify options for web service requests using the `weboptions` function. To prevent caching, set the 'CacheControl' to 'no-cache'.
options = weboptions('ContentType', 'text', 'KeyName', 'Cache-Control', 'KeyValue', 'no-cache');
2. Use a Unique URL Query:
  • By altering the URL on each request, you can prompt the server to return fresh data. A common approach is to append a timestamp as a query parameter
  • Here is a sample code for that:
while true
% Create a unique timestamp for each iteration
t = datetime('now');
timestamp = char(t.Format('yyyyMMddHHmmssFFF'));
url = ['https://finance.yahoo.com/quote/STOCK_SYMBOL?s=' timestamp];
% Use webread with the modified URL and options
content = webread(url, options);
% Parse the content to extract the stock price
% (Your parsing code will be needed here)
% Wait for a short interval before making the next request
pause(5); % Adjust the pause as needed
end
3. Dynamic Content Consideration:
  • If the stock price is rendered dynamically with JavaScript, `webread` might not be able to capture the updated content. In such cases, you might need to use a headless browser or another method that can execute the JavaScript on the page to retrieve the dynamic content.
For more details you can refer to the following MathWorks documentation:
Hope it helps

产品


版本

R2022a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by