How to access itemprop = "name" from within a data structure in HTML code using Matlab?
显示 更早的评论
HTML code
<div class="itemName largestFont" itemprop="name"> Information which I want to extract </div>
<div class="itemCategory largeFont"><a href="/somerandomwebsitelink"> Information which I dont need </a></div>
I want to extract the information from itemprop = "name" only
using the selector feature with text analytics,
I can do "selector = "DIV.itemHeader"
Item Header is the class in which both those div elements lie and as a result both of the information within those divs is extracted.
I only want the information from itemprop = "name"
How do I go about doing that?
3 个评论
Walter Roberson
2019-3-26
Are you using the Text Analytics Toolbox? https://www.mathworks.com/help/textanalytics/ug/parse-html-and-extract-text-content.html
N/A
2019-3-26
Walter Roberson
2019-3-26
Unfortunately I do not have that toolbox to test with.
My own implementation would probably be to use regexp with named tokens and the 'names' option.
采纳的回答
更多回答(1 个)
Sean de Wolski
2019-3-28
编辑:Sean de Wolski
2019-3-28
Using htmlTree, this is trivial:
tree = htmlTree(fileread('yourfile.html'))
div = tree.findElement('div')
item = div.getAttribute("itemprop")
names = item == "name"
div(names).extractHTMLText
4 个评论
N/A
2019-3-28
TADA
2019-3-28
Neither me nor Walter Robertson (as far as I know) work for mathworks... I'd gladly take that raise though :)
Sean de Wolski
2019-3-29
@TADA, we're always hiring into MathWorks and have a distributor in Israel who may or may not be looking for MATLAB users.
@Shivam, this returns exactly what you want from your comment above:
s = string(webread("https://beta.trollandtoad.com/yugioh/invasion-of-chaos-ioc-unlimited-singles/manticore-of-darkness-ioc-067-ultra-rare-unlimited/1155511", weboptions('Timeout', 15)));
%%
tree = htmlTree(s)
%%
div = tree.findElement('div')
%%
item = div.getAttribute("itemprop")
%%
names = item == "name"
%%
div(names).extractHTMLText
ans =
"Manticore of Darkness - IOC-067 - Ultra Rare Unlimited"
类别
在 帮助中心 和 File Exchange 中查找有关 Language Support 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!