webbot

版本 1.0.0.0 (6.0 KB) 作者: Laurent Cavin
A Java-based "web browser" that extract all links from a web-page, and display them.
8.6K 次下载
更新时间 2003/10/15

查看许可证

WEBBOT Java-based browser with download and PERL regular expressions. The function will extract all links from a web-page, and display them. The resulting documents can be downloaded.

WEBBOT(URL)
URL is a string indicating the base page address; the url must link to an html file. The function lists all links in the file. URL can also be a cell vector of url-strings.

WEBBOT(URL, WHAT)
displays only specific links. WHAT is a string:
'all_links': displays all links (default).
'page_links': displays all links to an html web page*.
'local_links': displays all local links on the server*.
'external_links': displays all links to external websites.
'image_links': displays all links to an image file**.
'image_tags': displays all image tags <img src="xxx">.
'.xxx.yyyy.zz': displays all links to each specific .xxx files; the case is ignored ('zip' will find 'ZiP'); e.g. '.zip.gz.gzip.tar.Z'.

WEBBOT(URL, WHAT, ACT)
performs an action on found links. ACT is a string:
'noaction': just display links (default)
'download': downloads all links found locally.
'cartoons': downloads all image tags found on linked pages. This is usefull for cartoons websites where each cartoon (e.g. "01.gif") is on its own html page (e.g. "c01.html").
<li>'follow.x': follows links to html pages and recursively performs the same action on the resulting page. 'x' is an integer indicating the ecursivity depth (0 is equivalent to 'noaction').

lks = WEBBOT(URL, ...)
returns an cell-array with links of URL{end}.

Notes: * Links explicitely pointing to a .htm or .html url.
** Image links are recognized by the following file types:
.jpg .jpeg .gif .pict .bmp .tif .tiff .ras .png (.giff)

Try it with:
webbot('http://www.unitedmedia.com/comics/dilbert/archive/', ...
'local_links', 'cartoons');

Written by L.Cavin, 28.09.2003, (c) CSE
This code is free to use and modify for non-commercial purposes.
Web address: http://ltcmail.ethz.ch/cavin/CSEDBLib.html#WEBBOT

引用格式

Laurent Cavin (2024). webbot (https://www.mathworks.com/matlabcentral/fileexchange/4023-webbot), MATLAB Central File Exchange. 检索来源 .

MATLAB 版本兼容性
创建方式 R13
兼容任何版本
平台兼容性
Windows macOS Linux
类别
Help CenterMATLAB Answers 中查找有关 Call Web Services from MATLAB Using HTTP 的更多信息

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!
版本 已发布 发行说明
1.0.0.0

Major update:
Much, much, faster downloads with the Matworks object "com.mathworks.mlwidgets.io.InterruptibleStreamCopier".
The old code using "java.net.url" is still included for demonstration purposes.