webbot
WEBBOT Java-based browser with download and PERL regular expressions. The function will extract all links from a web-page, and display them. The resulting documents can be downloaded.
WEBBOT(URL)
URL is a string indicating the base page address; the url must link to an html file. The function lists all links in the file. URL can also be a cell vector of url-strings.
WEBBOT(URL, WHAT)
displays only specific links. WHAT is a string:
'all_links': displays all links (default).
'page_links': displays all links to an html web page*.
'local_links': displays all local links on the server*.
'external_links': displays all links to external websites.
'image_links': displays all links to an image file**.
'image_tags': displays all image tags <img src="xxx">.
'.xxx.yyyy.zz': displays all links to each specific .xxx files; the case is ignored ('zip' will find 'ZiP'); e.g. '.zip.gz.gzip.tar.Z'.
WEBBOT(URL, WHAT, ACT)
performs an action on found links. ACT is a string:
'noaction': just display links (default)
'download': downloads all links found locally.
'cartoons': downloads all image tags found on linked pages. This is usefull for cartoons websites where each cartoon (e.g. "01.gif") is on its own html page (e.g. "c01.html").
<li>'follow.x': follows links to html pages and recursively performs the same action on the resulting page. 'x' is an integer indicating the ecursivity depth (0 is equivalent to 'noaction').
lks = WEBBOT(URL, ...)
returns an cell-array with links of URL{end}.
Notes: * Links explicitely pointing to a .htm or .html url.
** Image links are recognized by the following file types:
.jpg .jpeg .gif .pict .bmp .tif .tiff .ras .png (.giff)
Try it with:
webbot('http://www.unitedmedia.com/comics/dilbert/archive/', ...
'local_links', 'cartoons');
Written by L.Cavin, 28.09.2003, (c) CSE
This code is free to use and modify for non-commercial purposes.
Web address: http://ltcmail.ethz.ch/cavin/CSEDBLib.html#WEBBOT
引用格式
Laurent Cavin (2024). webbot (https://www.mathworks.com/matlabcentral/fileexchange/4023-webbot), MATLAB Central File Exchange. 检索来源 .
MATLAB 版本兼容性
平台兼容性
Windows macOS Linux类别
- MATLAB > External Language Interfaces > Web Services with MATLAB > Call Web Services from MATLAB Using HTTP >
标签
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!版本 | 已发布 | 发行说明 | |
---|---|---|---|
1.0.0.0 | Major update:
|