An HTMLImportOptions
object enables you to specify how
MATLAB® imports structured, tabular data from HTML files. The object contains properties
that control the data import process, including the handling of errors and missing
data.
You can create an HTMLImportOptions
object using either the
htmlImportOptions
function (described here) or the detectImportOptions
function:
Use htmlImportOptions
to define the import properties based on
your import requirements.
Use detectImportOptions
to detect and populate the import
properties based on the contents of the HTML file specified in
filename
.
opts = detectImportOptions(filename)
opts = htmlImportOptions
creates an
HTMLImportOptions
object with one variable.
opts = htmlImportOptions('NumVariables',
creates the object with the number of variables specified in
numVars
)numVars
.
opts = htmlImportOptions(___,
specifies additional properties for an
Name,Value
)HTMLImportOptions
object using one or more name-value
arguments.
numVars
— Number of variablesNumber of variables, specified as a positive scalar integer.
VariableNames
— Variable namesVariable names, specified as a cell array of character vectors or string array. The
VariableNames
property contains the names to use when importing
variables.
If the data contains N
variables, but no variable names are specified, then
the VariableNames
property contains
{'Var1','Var2',...,'VarN'}
.
To support invalid MATLAB identifiers as variable names, such as variable names containing spaces
and non-ASCII characters, set the value of VariableNamingRule
to
'preserve'
.
Example: opts.VariableNames
returns the current
(detected) variable names.
Example: opts.VariableNames(3)
= {'Height'}
changes the name of the third variable to Height
.
Data Types: char
| string
| cell
VariableNamingRule
— Flag to preserve variable names"preserve"
(default) | "modify"
Flag to preserve variable names, specified as either "modify"
or
"preserve"
.
"modify"
— Convert invalid variable names (as
determined by the isvarname
function) to
valid MATLAB identifiers.
"preserve"
— Preserve variable names that are not valid
MATLAB identifiers such as variable names that include spaces and
non-ASCII characters.
Starting in R2019b, variable names and row names can include any characters, including spaces and non-ASCII characters. Also, they can start with any characters, not just letters. Variable and row names do not have to be valid MATLAB identifiers (as determined by the isvarname
function). To preserve these variable names and row names, set the value of VariableNamingRule
to "preserve"
. Variable names are not refreshed when the value of VariableNamingRule
is changed from "modify"
to "preserve"
.
VariableTypes
— Data types of variableData type of variable, specified as a cell array of character vectors, or string array
containing a set of valid data type names. The VariableTypes
property
designates the data types to use when importing variables.
To update the VariableTypes
property, use the setvartype
function.
Example: opts.VariableTypes
returns the current variable data
types.
Example: opts = setvartype(opts,'Height',{'double'})
changes the
data type of the variable Height
to
double
.
SelectedVariableNames
— Subset of variables to importSubset of variables to import, specified as a character vector, string scalar, cell array of character vectors, string array or an array of numeric indices.
SelectedVariableNames
must be a subset of
names contained in the VariableNames
property.
By default, SelectedVariableNames
contains all
the variable names from the VariableNames
property,
which means that all variables are imported.
Use the SelectedVariableNames
property to
import only the variables of interest. Specify a subset of variables
using the SelectedVariableNames
property and use readtable
to import only that subset.
To support invalid MATLAB identifiers as variable names, such as variable names
containing spaces and non-ASCII characters, set the value of
VariableNamingRule
to
'preserve'
.
Example: opts.SelectedVariableNames = {'Height','LastName'}
selects
only two variables, Height
and LastName
,
for the import operation.
Example: opts.SelectedVariableNames
= [1 5]
selects only two variables, the first variable and
the fifth variable, for the import operation.
Example: T = readtable(filename,opts)
returns
a table containing only the variables specified in the SelectedVariableNames
property
of the opts
object.
Data Types: uint16
| uint32
| uint64
| char
| string
| cell
VariableOptions
— Type specific variable import optionsType specific variable import options, returned as an array
of variable import options objects. The array contains an object corresponding
to each variable specified in the VariableNames
property.
Each object in the array contains properties that support the importing
of data with a specific data type.
Variable options support these data types: numeric, text, logical
, datetime
,
or categorical
.
To query the current (or detected) options for a variable, use
the getvaropts
function.
To set and customize options for a variable, use the setvaropts
function.
Example: opts.VariableOptions
returns a collection
of VariableImportOptions
objects, one corresponding
to each variable in the data.
Example: getvaropts(opts,'Height')
returns
the VariableImportOptions
object for the Height
variable.
Example: opts = setvaropts(opts,'Height','FillValue',0)
sets
the FillValue
property for the variable Height
to 0
.
TableSelector
— Table data XPath expressionTable data XPath expression, specified as a character vector or string scalar that
the reading function uses to select the output table data. You must specify
TableSelector
as a valid XPath version 1.0 expression.
This table shows some example XPath expressions for selecting tables in HTML files.
Description | TableSelector |
---|---|
Table containing the text "Cash dividends" | "//TABLE[contains(.,'Cash dividends')]" |
Second table with more than 10 rows | "//TABLE[count(TR)>10][2]" |
Table with a header cell exactly matching "Description" | "//TABLE[.//TH='Description']" |
Table with caption containing "Report" | "//TABLE[contains(CAPTION,'Report')]" |
First table after section header "Summary" | "H1[.='Summary']/following-sibling::TABLE[1]" |
Table with id attribute matching
'income' | "//TABLE[@id='income']" |
Example: 'TableSelector',"//TABLE[contains(.,'Cash
dividends')]"
DataRows
— Data locationData location, specified as a positive scalar integer or a N-
by-2
array of positive scalar integers. Specify DataRows
using one of these forms.
Specify as | Description |
---|---|
| Specify the first row that contains the data. Specifying the value using
|
| Specify the row range that contains the data. Values in the array |
| Specify multiple row ranges to read with an
A valid array of multiple row ranges must:
When specifying multiple row ranges,
use |
Example: opts.DataRows = 5
sets the DataRows
property to the value [5 inf]
. Read all rows of data starting from row 5
to the end-of-file.
Example: opts.DataRows = [2 6]
sets the property to read rows 2
through 6
.
Example: opts.DataRows = [1 3; 5 6; 8 inf]
sets the property to read rows 1
, 2
, 3
, 5
, 6
, and all rows between 8
, and the end-of-file.
Data Types: single
| double
| int8
| int16
| int32
| int64
| uint8
| uint16
| uint32
| uint64
RowNamesColumn
— Row names location0
(default) | positive scalar integerRow names location, specified as a positive scalar integer.
The RowNamesColumn
property specifies the location
of the column containing the row names.
If RowNamesColumn
is specified as 0, then
do not import the row names. Otherwise, import the row names from
the specified column.
Example: opts.RowNamesColumn = 2;
Data Types: single
| double
| uint8
| uint16
| uint32
| uint64
VariableNamesRow
— Row containing variable names0
(default) | nonnegative integerRow containing variable names, specified as a nonnegative integer. The
VariableNamesRow
property specifies the row number where variable
names are located.
If VariableNamesRow
is 0
, then do not import the
variable names. Otherwise, import the variable names from the specified row.
Example: opts.VariableNamesRow = 6;
Data Types: single
| double
| int8
| int16
| int32
| int64
| uint8
| uint16
| uint32
| uint64
VariableUnitsRow
— Row containing variable units0
(default) | nonnegative integerRow containing variable units, specified as a nonnegative integer.
If VariableUnitsRow
is 0
, then the software does
not import the variable units. Otherwise, the software imports the variable units from
the specified row.
Data Types: single
| double
| int8
| int16
| int32
| int64
| uint8
| uint16
| uint32
| uint64
VariableDescriptionsRow
— Row containing variable descriptions0
(default) | nonnegative integerRow containing variable descriptions, specified as a nonnegative integer.
If VariableDescriptionsRow
is 0
, then the
software does not import the variable descriptions. Otherwise, the software imports the
variable descriptions from the specified row.
Data Types: single
| double
| int8
| int16
| int32
| int64
| uint8
| uint16
| uint32
| uint64
MissingRule
— Procedure to manage missing data'fill'
(default) | 'error'
| 'omitrow'
| 'omitvar'
Procedure to manage missing data, specified as one of the values in this table.
Missing Rule | Behavior |
---|---|
'fill' | Replace missing data with the contents of the The |
'error' | Stop importing and display an error message showing the missing record and field. |
'omitrow' | Omit rows that contain missing data. |
'omitvar' | Omit variables that contain missing data. |
Example: opts.MissingRule = 'omitrow';
Data Types: char
| string
EmptyRowRule
— Procedure to handle empty rows"skip"
(default) | "read"
| "error"
Procedure to handle empty rows in the data, specified as "skip"
,
"read"
, or "error"
. The importing function
interprets white space as empty.
Empty Row Rule | Behavior |
---|---|
"skip" | Skip the empty rows. |
"read" | Import the empty rows. The importing function parses the empty row using the values specified in VariableOptions , MissingRule , and other relevant properties. |
"error" | Display an error message and abort the import operation. |
ImportErrorRule
— Procedure to handle import errors'fill'
(default) | 'error'
| 'omitrow'
| 'omitvar'
Procedure to handle import errors, specified as one of the values in this table.
Import Error Rule | Behavior |
---|---|
'fill' | Replace the data where the error occurred with the contents of the
The
|
'error' | Stop importing and display an error message showing the error-causing record and field. |
'omitrow' | Omit rows where errors occur. |
'omitvar' | Omit variables where errors occur. |
Example: opts.ImportErrorRule = 'omitvar';
Data Types: char
| string
ExtraColumnsRule
— Procedure to handle extra columns'addvars'
| 'ignore'
| 'wrap'
| 'error'
Procedure to handle extra columns in the data, specified as one of the values in this table.
Extra Columns Rule | Behavior |
---|---|
'addvars' | To import extra columns, create new variables. If there are |
'ignore' | Ignore the extra columns of data. |
'wrap' | Wrap the extra columns of data to new records. This action does not change the number of variables. |
'error' | Display an error message and abort the import operation. |
Data Types: char
| string
MergedCellColumnRule
— Procedure to handle cells with merged columns"placeleft"
(default) | "placeright"
| "duplicate"
| "omitrow"
| "error"
Procedure to handle cells with merged columns, specified as one of the values in this table.
Import Rule | Behavior |
---|---|
"placeleft" | Place the data in the left-most cell and fill the remaining cells
with the contents of the The |
"placeright" | Place the data in the right-most cell and fill the remaining cells
with the contents of the The |
"duplicate" | Duplicate the data in all cells. |
"omitrow" | Omit rows where merged cells occur. |
"error" | Display an error message and abort the import operation. |
Example: "MergedCellColumnRule","placeright"
MergedCellRowRule
— Procedure to handle cells with merged rows"placetop"
(default) | "placebottom"
| "duplicate"
| "omitvar"
| "error"
Procedure to handle cells with merged rows, specified as one of the values in this table.
Import Rule | Behavior |
---|---|
"placetop" | Place the data in the top cell and fill the remaining cells with
the contents of the The
|
"placebottom" | Place the data in the bottom cell and fill the remaining cells with
the contents of the The
|
"duplicate" | Duplicate the data in all cells. |
"omitvar" | Omit variables where merged rows occur. |
"error" | Display an error message and abort the import operation. |
Example: "MergedCellRowRule","duplicate"
Create import options for an HTML file, specify the table to import, and then read the data.
Create an HTMLDocumentImportOptions
object. Read from the first table containing the word "readtable" using the XPath query "//TABLE[contains(.,'readtable')]"
.
opts = htmlImportOptions('TableSelector',"//TABLE[contains(.,'readtable')]")
opts = HTMLImportOptions with properties: Replacement Properties: MissingRule: "fill" ImportErrorRule: "fill" EmptyRowRule: "skip" MergedCellColumnRule: "placeleft" MergedCellRowRule: "placetop" ExtraColumnsRule: "addvars" Variable Import Properties: Set types by name using setvartype VariableNames: "Var1" VariableTypes: "char" SelectedVariableNames: "Var1" VariableOptions: Show all 1 VariableOptions Access VariableOptions sub-properties using setvaropts/getvaropts VariableNamingRule: "preserve" Location Properties: TableSelector: "//TABLE[contains(.,'readtable')]" DataRows: [1 Inf] VariableNamesRow: 0 VariableUnitsRow: 0 VariableDescriptionsRow: 0 RowNamesColumn: 0
Read the table from the URL https://www.mathworks.com/help/matlab/text-files.html
using the readtable
function with the options object.
url = "https://www.mathworks.com/help/matlab/text-files.html";
T = readtable(url,opts)
T=4×2 table
Var1 ExtraVar1
__________________ ______________________________
{'readtable' } {'Create table from file' }
{'writetable' } {'Write table to file' }
{'readtimetable' } {'Create timetable from file'}
{'writetimetable'} {'Write timetable to file' }
Introduced in R2021b