Download files from a website from an Excel list

9 次查看(过去 30 天)
Hi,
I would like to download multiple files from a webiste: https://www.sec.gov/edgar/searchedgar/mutualsearch.htm
I have an Excel file that list all the "tickers" to input in the "ticker symbol" space (for instance LOCSX). Then click on the "CIK" number (0001014913 in the same example), then click on the "Document" button corresponding to the first "485A..." file, to finally download the first HTML file ( ffi485a.htm in our exemple). Can anyone help me with that?
I enclose the excel file that contain the list of tickers. Thanks

回答(1 个)

Guillaume
Guillaume 2016-8-1
编辑:Guillaume 2016-8-1
I don't think you'll be able to simulate clicking on a webpage with matlab, and even if you could, it's totally the wrong approach.
Ideally, there would be a way to query the database directly and get the output as xml, json, or some other format easily parsable by a computer. This is the first thing I'd look for if I were you.
Otherwise, a better approach is to simply generate the correct URL for each query. For example, the result of your first search is always the page: https://www.sec.gov/cgi-bin/series?&sc=companyseries&ticker= TICKERVALUE. So it's trivial to obtain its content with:
baseurl = 'https://www.sec.gov/cgi-bin/series';
tickervalue = 'CSEFX'; %for example
cikpage = webread(baseurl, 'sc', 'companyseries', 'ticker', tickervalue);
Unfortunately, you get the output as html, which is not easy to parse for information. As I said, it would be better to be able to get the result as xml, or json, or ... Otherwise, the following regex may work (no guarantee it will work on every page):
cikvalue = regexp(cikpage, '(?<=<td colspan="3" nowrap="nowrap"><a class="search" href="[^"]*">)\d+', 'match', 'once')
To get to the next page, it's then simply:
baseurl = 'https://www.sec.gov/cgi-bin/browse-edgar';
searchresult = webread(baseurl, 'CIK', cikvalue, 'action', 'getcompany');
You then have to parse the resulting html page to get the next link. Something you'll have to figure out for yourself...

类别

Help CenterFile Exchange 中查找有关 String Parsing 的更多信息

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by