string
Convert parsed HTML tree to string
Syntax
Description
converts the str
= string(tree
)htmlTree
object tree
to string.
Tip
Use the string
function to help inspect the underlying HTML
code of htmlTree
objects. To navigate elements of
htmlTree
objects, use the findElement
function.
Examples
Convert Parsed HTML Code to String
Read HTML code from the URL https://www.mathworks.com/help/textanalytics
using the webread
function.
url = "https://www.mathworks.com/help/textanalytics";
code = webread(url);
Parse the HTML code using the htmlTree
function.
tree = htmlTree(code);
Find all the paragraphs in the HTML tree using the findElement
function. The paragraphs are the nodes with element name "P".
subtrees = findElement(tree,"P");
Convert the subtrees to string using the string
function.
str = string(subtrees)
str = 18×1 string
"<P class="h1">↵ <A href="../index.html">Help Center</A>↵</P>"
"<P>Text Analytics Toolbox™ provides algorithms and visualizations for preprocessing, analyzing, and modeling text data. Models created with the toolbox can be used in applications such as sentiment analysis, predictive maintenance, and topic modeling.</P>"
"<P>Text Analytics Toolbox includes tools for processing raw text from sources such as equipment logs, news feeds, surveys, operator reports, and social media. You can extract text from popular file formats, preprocess raw text, extract individual words, convert text into numerical representations, and build statistical models.</P>"
"<P>Using machine learning techniques such as LSA, LDA, and word embeddings, you can find clusters and create features from high-dimensional text datasets. Features created with Text Analytics Toolbox can be combined with features from other data sources to build machine learning models that take advantage of textual, numeric, and other types of data.</P>"
"<P class="category_desc">Learn the basics of Text Analytics Toolbox</P>"
"<P class="category_desc">Import text data into MATLAB<SUP>®</SUP> and preprocess it for analysis</P>"
"<P class="category_desc">Develop predictive models using topic models and word embeddings</P>"
"<P class="category_desc">Visualize text data and models using word clouds and text scatter plots</P>"
"<P class="category_desc">Information on language support in Text Analytics Toolbox</P>"
"<P>You clicked a link that corresponds to this MATLAB command:</P>"
"<P>Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.</P>"
"<P class="h1 icon-globe icon_color_secondary" id="country-unselected-title">Select a Web Site</P>"
"<P>Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: <STRONG class="recommended-country"/>.</P>"
"<P>You can also select a web site from the following list:</P>"
"<P>Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.</P>"
"<P class="text-center">↵ <A href="#" class="worldwide_link">Contact your local office</A>↵</P>"
"<P class="copyright" translate="no">© 1994-2024 The MathWorks, Inc.</P>"
"<P>↵ <EM>Join the conversation</EM>↵</P>"
Input Arguments
tree
— HTML tree
htmlTree
array
HTML tree, specified as an htmlTree
array.
Output Arguments
str
— String
string
String, returned as a string array with the same size as
tree
.
Version History
Introduced in R2018bR2021a: string
function for htmlTree
objects uses two spaces for indentation
The output of the string
function for htmlTree
objects is automatically indented for readability. Starting in R2021a, the function indents
HTML code using two whitespace characters. In previous releases, the function indents HTML
code with four spaces.
This change affects code that parses the HTML string directly. To parse and navigate
HTML code, use htmlTree
objects.
R2021a: string
function for htmlTree
objects returns attributes in different order
When creating an htmlTree
object, the software automatically parses the
HTML element attributes of the input HTML code. Starting in R2021a, the software uses an
updated algorithm to parse the HTML element attributes. This change can result in the
string
function returning HTML code with the attributes in a
different order.
MATLAB Command
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)
Asia Pacific
- Australia (English)
- India (English)
- New Zealand (English)
- 中国
- 日本Japanese (日本語)
- 한국Korean (한국어)