htmlImportOptions

Import options object for HTML files

Since R2021b

Description

An HTMLImportOptions object enables you to specify how MATLAB^® imports structured, tabular data from HTML files. The object contains properties that control the data import process, including the handling of errors and missing data.

Creation

You can create an HTMLImportOptions object using either the htmlImportOptions function (described here) or the detectImportOptions function:

Use htmlImportOptions to define the import properties based on your import requirements.
Use detectImportOptions to detect and populate the import properties based on the contents of the HTML file specified in filename.
```
opts = detectImportOptions(filename)
```

Syntax

opts = htmlImportOptions

opts = htmlImportOptions('NumVariables',numVars)

opts = htmlImportOptions(___,Name,Value)

Description

opts = htmlImportOptions creates an HTMLImportOptions object with one variable.

opts = htmlImportOptions('NumVariables',numVars) creates the object with the number of variables specified in numVars.

opts = htmlImportOptions(___,Name,Value) specifies additional properties for an HTMLImportOptions object using one or more name-value arguments.

example

Input Arguments

expand all

`numVars` — Number of variables
positive scalar integer

Number of variables, specified as a positive scalar integer.

Properties

expand all

Variable Properties

`VariableNames` — Variable names
cell array of character vectors | string array

Variable names, specified as a cell array of character vectors or string array. The VariableNames property contains the names to use when importing variables.

If the data contains N variables, but no variable names are specified, then the VariableNames property contains {'Var1','Var2',...,'VarN'}.

To support invalid MATLAB identifiers as variable names, such as variable names containing spaces and non-ASCII characters, set the value of VariableNamingRule to 'preserve'.

Example: opts.VariableNames returns the current (detected) variable names.

Example: opts.VariableNames(3) = {'Height'} changes the name of the third variable to Height.

Data Types: char | string | cell

`VariableNamingRule` — Flag to preserve variable names
`"preserve"` (default) | `"modify"`

Flag to preserve variable names, specified as either "modify" or "preserve".

"modify" — Convert invalid variable names (as determined by the isvarname function) to valid MATLAB identifiers.
"preserve" — Preserve variable names that are not valid MATLAB identifiers such as variable names that include spaces and non-ASCII characters.

Starting in R2019b, variable names and row names can include any characters, including spaces and non-ASCII characters. Also, they can start with any characters, not just letters. Variable and row names do not have to be valid MATLAB identifiers (as determined by the isvarname function). To preserve these variable names and row names, set the value of VariableNamingRule to "preserve". Variable names are not refreshed when the value of VariableNamingRule is changed from "modify" to "preserve".

`VariableTypes` — Data types of variable
cell array of character vectors | string array

Data type of variable, specified as a cell array of character vectors, or string array containing a set of valid data type names. The VariableTypes property designates the data types to use when importing variables.

To update the VariableTypes property, use the setvartype function.

Example: opts.VariableTypes returns the current variable data types.

Example: opts = setvartype(opts,'Height',{'double'}) changes the data type of the variable Height to double.

`SelectedVariableNames` — Subset of variables to import
character vector | string scalar | cell array of character vectors | string array | array of numeric indices

Subset of variables to import, specified as a character vector, string scalar, cell array of character vectors, string array or an array of numeric indices.

SelectedVariableNames must be a subset of names contained in the VariableNames property. By default, SelectedVariableNames contains all the variable names from the VariableNames property, which means that all variables are imported.

Use the SelectedVariableNames property to import only the variables of interest. Specify a subset of variables using the SelectedVariableNames property and use readtable to import only that subset.

To support invalid MATLAB identifiers as variable names, such as variable names containing spaces and non-ASCII characters, set the value of VariableNamingRule to 'preserve'.

Example: opts.SelectedVariableNames = {'Height','LastName'} selects only two variables, Height and LastName, for the import operation.

Example: opts.SelectedVariableNames = [1 5] selects only two variables, the first variable and the fifth variable, for the import operation.

Example: T = readtable(filename,opts) returns a table containing only the variables specified in the SelectedVariableNames property of the opts object.

`VariableOptions` — Type specific variable import options
array of variable import options objects

Type specific variable import options, returned as an array of variable import options objects. The array contains an object corresponding to each variable specified in the VariableNames property. Each object in the array contains properties that support the importing of data with a specific data type.

Variable options support these data types: numeric, text, logical, datetime, or categorical.

To query the current (or detected) options for a variable, use the getvaropts function.

To set and customize options for a variable, use the setvaropts function.

Example: opts.VariableOptions returns a collection of VariableImportOptions objects, one corresponding to each variable in the data.

Example: getvaropts(opts,'Height') returns the VariableImportOptions object for the Height variable.

Example: opts = setvaropts(opts,'Height','FillValue',0) sets the FillValue property for the variable Height to 0.

Location Properties

`TableSelector` — Table data XPath expression
character vector | string scalar

Table data XPath expression, specified as a character vector or string scalar that the reading function uses to select the output table data. You must specify TableSelector as a valid XPath version 1.0 expression.

This table shows some example XPath expressions for selecting tables in HTML files.

Description	`TableSelector`
Table containing the text "Cash dividends"	`"//TABLE[contains(.,'Cash dividends')]"`
Second table with more than 10 rows	`"//TABLE[count(TR)>10][2]"`
Table with a header cell exactly matching "Description"	`"//TABLE[.//TH='Description']"`
Table with caption containing "Report"	`"//TABLE[contains(CAPTION,'Report')]"`
First table after section header "Summary"	`"H1[.='Summary']/following-sibling::TABLE[1]"`
Table with `id` attribute matching `'income'`	`"//TABLE[@id='income']"`

Example: 'TableSelector',"//TABLE[contains(.,'Cash dividends')]"

`DataRows` — Data location
positive scalar integer | array of positive scalar integers

Data location, specified as a positive scalar integer or a N-by-2 array of positive scalar integers. Specify DataRows using one of these forms.

Specify as	Description
`n`	Specify the first row that contains the data. Specifying the value using `n` sets the value of `DataRows` property to `[n inf]`. The importing function reads all rows between `n` and the end-of-file. `n` must be a positive integer greater than zero.
`[n1 n2]`	Specify the row range that contains the data. `n1` is the first row that contains the data and the `n2` is the last row that contains the data. Values in the array `[n1 n2]` must be nonzero positive integers and `n2` must be greater than `n1`.
`[n1 n2; n3 n4;...]`	Specify multiple row ranges to read with an `N`-by-`2` array containing `N` different row ranges. A valid array of multiple row ranges must: Specify row ranges in an increasing order, that is the first row range specified in the array appears in the file before the other row ranges. Contain only nonoverlapping row ranges. When specifying multiple row ranges, use `Inf` only when specifying the end of the last row range in the array. For example, `[1 3; 5 6; 8 Inf]`.

Specify as

Description

n

Specify the first row that contains the data. Specifying the value using n sets the value of DataRows property to [n inf]. The importing function reads all rows between n and the end-of-file.

n must be a positive integer greater than zero.

[n1 n2]

Specify the row range that contains the data. n1 is the first row that contains the data and the n2 is the last row that contains the data.

Values in the array [n1 n2] must be nonzero positive integers and n2 must be greater than n1.

[n1 n2; n3 n4;...]

Specify multiple row ranges to read with an N-by-2 array containing N different row ranges.

A valid array of multiple row ranges must:

Specify row ranges in an increasing order, that is the first row range specified in the array appears in the file before the other row ranges.
Contain only nonoverlapping row ranges.

When specifying multiple row ranges, use Inf only when specifying the end of the last row range in the array. For example, [1 3; 5 6; 8 Inf].

Example: opts.DataRows = 5 sets the DataRows property to the value [5 inf]. Read all rows of data starting from row 5 to the end-of-file.

Example: opts.DataRows = [2 6] sets the property to read rows 2 through 6.

Example: opts.DataRows = [1 3; 5 6; 8 inf] sets the property to read rows 1, 2, 3, 5, 6, and all rows between 8, and the end-of-file.

`RowNamesColumn` — Row names location
`0` (default) | positive scalar integer

Row names location, specified as a positive scalar integer. The RowNamesColumn property specifies the location of the column containing the row names.

If RowNamesColumn is specified as 0, then do not import the row names. Otherwise, import the row names from the specified column.

Example: opts.RowNamesColumn = 2;

`VariableNamesRow` — Row containing variable names
`0` (default) | nonnegative integer

Row containing variable names, specified as a nonnegative integer. The VariableNamesRow property specifies the row number where variable names are located.

If VariableNamesRow is 0, then do not import the variable names. Otherwise, import the variable names from the specified row.

Example: opts.VariableNamesRow = 6;

`VariableUnitsRow` — Row containing variable units
`0` (default) | nonnegative integer

Row containing variable units, specified as a nonnegative integer.

If VariableUnitsRow is 0, then the software does not import the variable units. Otherwise, the software imports the variable units from the specified row.

`VariableDescriptionsRow` — Row containing variable descriptions
`0` (default) | nonnegative integer

Row containing variable descriptions, specified as a nonnegative integer.

If VariableDescriptionsRow is 0, then the software does not import the variable descriptions. Otherwise, the software imports the variable descriptions from the specified row.

Replacement Rules

`MissingRule` — Procedure to manage missing data
`'fill'` (default) | `'error'` | `'omitrow'` | `'omitvar'`

Procedure to manage missing data, specified as one of the values in this table.

Missing Rule	Behavior
`'fill'`	Replace missing data with the contents of the `FillValue` property. The `FillValue` property is specified in the `VariableImportOptions` object of the variable being imported. For more information on accessing the `FillValue` property, see `setvaropts`.
`'error'`	Stop importing and display an error message showing the missing record and field.
`'omitrow'`	Omit rows that contain missing data.
`'omitvar'`	Omit variables that contain missing data.

Example: opts.MissingRule = 'omitrow';

Data Types: char | string

`EmptyRowRule` — Procedure to handle empty rows
`"skip"` (default) | `"read"` | `"error"`

Procedure to handle empty rows in the data, specified as "skip", "read", or "error". The importing function interprets white space as empty.

Empty Row Rule	Behavior
`"skip"`	Skip the empty rows.
`"read"`	Import the empty rows. The importing function parses the empty row using the values specified in `VariableOptions`, `MissingRule`, and other relevant properties.
`"error"`	Display an error message and abort the import operation.

`ImportErrorRule` — Procedure to handle import errors
`'fill'` (default) | `'error'` | `'omitrow'` | `'omitvar'`

Procedure to handle import errors, specified as one of the values in this table.

Import Error Rule	Behavior
`'fill'`	Replace the data where the error occurred with the contents of the `FillValue` property. The `FillValue` property is specified in the `VariableImportOptions` object of the variable being imported. For more information on accessing the `FillValue` property, see `setvaropts`.
`'error'`	Stop importing and display an error message showing the error-causing record and field.
`'omitrow'`	Omit rows where errors occur.
`'omitvar'`	Omit variables where errors occur.

Example: opts.ImportErrorRule = 'omitvar';

Data Types: char | string

`ExtraColumnsRule` — Procedure to handle extra columns
`'addvars'` | `'ignore'` | `'wrap'` | `'error'`

Procedure to handle extra columns in the data, specified as one of the values in this table.

Extra Columns Rule	Behavior
`'addvars'`	To import extra columns, create new variables. If there are `N` extra columns, then import new variables as `'ExtraVar1', 'ExtraVar2',..., 'ExtraVarN'`. Extra columns of data are imported as if their `VariableTypes` are `char`.
`'ignore'`	Ignore the extra columns of data.
`'wrap'`	Wrap the extra columns of data to new records. This action does not change the number of variables.
`'error'`	Display an error message and abort the import operation.

Data Types: char | string

`MergedCellColumnRule` — Procedure to handle cells with merged columns
`"placeleft"` (default) | `"placeright"` | `"duplicate"` | `"omitrow"` | `"error"`

Procedure to handle cells with merged columns, specified as one of the values in this table.

Import Rule	Behavior
`"placeleft"`	Place the data in the left-most cell and fill the remaining cells with the contents of the `FillValue` property. The `FillValue` property is specified in the `VariableImportOptions` object of the variable being imported. For more information on accessing the `FillValue` property, see `getvaropts`.
`"placeright"`	Place the data in the right-most cell and fill the remaining cells with the contents of the `FillValue` property. The `FillValue` property is specified in the `VariableImportOptions` object of the variable being imported. For more information on accessing the `FillValue` property, see `getvaropts`.
`"duplicate"`	Duplicate the data in all cells.
`"omitrow"`	Omit rows where merged cells occur.
`"error"`	Display an error message and abort the import operation.

Example: "MergedCellColumnRule","placeright"

`MergedCellRowRule` — Procedure to handle cells with merged rows
`"placetop"` (default) | `"placebottom"` | `"duplicate"` | `"omitvar"` | `"error"`

Procedure to handle cells with merged rows, specified as one of the values in this table.

Import Rule	Behavior
`"placetop"`	Place the data in the top cell and fill the remaining cells with the contents of the `FillValue` property. The `FillValue` property is specified in the `VariableImportOptions` object of the variable being imported. For more information on accessing the `FillValue` property, see `getvaropts`.
`"placebottom"`	Place the data in the bottom cell and fill the remaining cells with the contents of the `FillValue` property. The `FillValue` property is specified in the `VariableImportOptions` object of the variable being imported. For more information on accessing the `FillValue` property, see `getvaropts`.
`"duplicate"`	Duplicate the data in all cells.
`"omitvar"`	Omit variables where merged rows occur.
`"error"`	Display an error message and abort the import operation.

Example: "MergedCellRowRule","duplicate"

Object Functions

Examples

collapse all

Create Import Options for HTML File

Open Live Script

Create import options for an HTML file, specify the table to import, and then read the data.

Create an HTMLDocumentImportOptions object. Read from the first table containing the word "readtable" using the XPath query "//TABLE[contains(.,'readtable')]".

opts = htmlImportOptions('TableSelector',"//TABLE[contains(.,'readtable')]")

opts = 
  HTMLImportOptions with properties:

   Replacement Properties:
                MissingRule: "fill"
            ImportErrorRule: "fill"
               EmptyRowRule: "skip"
       MergedCellColumnRule: "placeleft"
          MergedCellRowRule: "placetop"
           ExtraColumnsRule: "addvars"

   Variable Import Properties: Set types by name using setvartype
              VariableNames: "Var1"
              VariableTypes: "char"
      SelectedVariableNames: "Var1"
            VariableOptions: Show all 1 VariableOptions 
	Access VariableOptions sub-properties using setvaropts/getvaropts
         VariableNamingRule: "preserve"

   Location Properties:
              TableSelector: "//TABLE[contains(.,'readtable')]"
                   DataRows: [1 Inf]
           VariableNamesRow: 0
           VariableUnitsRow: 0
    VariableDescriptionsRow: 0
             RowNamesColumn: 0

Read the table from the URL https://www.mathworks.com/help/matlab/text-files.html using the readtable function with the options object.

url = "https://www.mathworks.com/help/matlab/text-files.html";
T = readtable(url,opts)

T=4×2 table
           Var1                     ExtraVar1           
    __________________    ______________________________

    {'readtable'     }    {'Create table from file'    }
    {'writetable'    }    {'Write table to file'       }
    {'readtimetable' }    {'Create timetable from file'}
    {'writetimetable'}    {'Write timetable to file'   }

Version History

Introduced in R2021b

htmlImportOptions

Description

Creation

Syntax

Description

Input Arguments

`numVars` — Number of variables
positive scalar integer

Properties

Variable Properties

`VariableNames` — Variable names
cell array of character vectors | string array

`VariableNamingRule` — Flag to preserve variable names
`"preserve"` (default) | `"modify"`

`VariableTypes` — Data types of variable
cell array of character vectors | string array

`SelectedVariableNames` — Subset of variables to import
character vector | string scalar | cell array of character vectors | string array | array of numeric indices

`VariableOptions` — Type specific variable import options
array of variable import options objects

Location Properties

`TableSelector` — Table data XPath expression
character vector | string scalar

`DataRows` — Data location
positive scalar integer | array of positive scalar integers

`RowNamesColumn` — Row names location
`0` (default) | positive scalar integer

`VariableNamesRow` — Row containing variable names
`0` (default) | nonnegative integer

`VariableUnitsRow` — Row containing variable units
`0` (default) | nonnegative integer

`VariableDescriptionsRow` — Row containing variable descriptions
`0` (default) | nonnegative integer

Replacement Rules

`MissingRule` — Procedure to manage missing data
`'fill'` (default) | `'error'` | `'omitrow'` | `'omitvar'`

`EmptyRowRule` — Procedure to handle empty rows
`"skip"` (default) | `"read"` | `"error"`

`ImportErrorRule` — Procedure to handle import errors
`'fill'` (default) | `'error'` | `'omitrow'` | `'omitvar'`

`ExtraColumnsRule` — Procedure to handle extra columns
`'addvars'` | `'ignore'` | `'wrap'` | `'error'`

`MergedCellColumnRule` — Procedure to handle cells with merged columns
`"placeleft"` (default) | `"placeright"` | `"duplicate"` | `"omitrow"` | `"error"`

`MergedCellRowRule` — Procedure to handle cells with merged rows
`"placetop"` (default) | `"placebottom"` | `"duplicate"` | `"omitvar"` | `"error"`

Object Functions

Examples

Create Import Options for HTML File

Version History

See Also

Topics

htmlImportOptions

Description

Creation

Syntax

Description

Input Arguments

numVars — Number of variables positive scalar integer

Properties

Variable Properties

VariableNames — Variable names cell array of character vectors | string array

VariableNamingRule — Flag to preserve variable names "preserve" (default) | "modify"

VariableTypes — Data types of variable cell array of character vectors | string array

SelectedVariableNames — Subset of variables to import character vector | string scalar | cell array of character vectors | string array | array of numeric indices

VariableOptions — Type specific variable import options array of variable import options objects

Location Properties

TableSelector — Table data XPath expression character vector | string scalar

DataRows — Data location positive scalar integer | array of positive scalar integers

RowNamesColumn — Row names location 0 (default) | positive scalar integer

VariableNamesRow — Row containing variable names 0 (default) | nonnegative integer

VariableUnitsRow — Row containing variable units 0 (default) | nonnegative integer

VariableDescriptionsRow — Row containing variable descriptions 0 (default) | nonnegative integer

Replacement Rules

MissingRule — Procedure to manage missing data 'fill' (default) | 'error' | 'omitrow' | 'omitvar'

EmptyRowRule — Procedure to handle empty rows "skip" (default) | "read" | "error"

ImportErrorRule — Procedure to handle import errors 'fill' (default) | 'error' | 'omitrow' | 'omitvar'

ExtraColumnsRule — Procedure to handle extra columns 'addvars' | 'ignore' | 'wrap' | 'error'

MergedCellColumnRule — Procedure to handle cells with merged columns "placeleft" (default) | "placeright" | "duplicate" | "omitrow" | "error"

MergedCellRowRule — Procedure to handle cells with merged rows "placetop" (default) | "placebottom" | "duplicate" | "omitvar" | "error"

Object Functions

Examples

Create Import Options for HTML File

Version History

See Also

Topics

`numVars` — Number of variables
positive scalar integer

`VariableNames` — Variable names
cell array of character vectors | string array

`VariableNamingRule` — Flag to preserve variable names
`"preserve"` (default) | `"modify"`

`VariableTypes` — Data types of variable
cell array of character vectors | string array

`SelectedVariableNames` — Subset of variables to import
character vector | string scalar | cell array of character vectors | string array | array of numeric indices

`VariableOptions` — Type specific variable import options
array of variable import options objects

`TableSelector` — Table data XPath expression
character vector | string scalar

`DataRows` — Data location
positive scalar integer | array of positive scalar integers

`RowNamesColumn` — Row names location
`0` (default) | positive scalar integer

`VariableNamesRow` — Row containing variable names
`0` (default) | nonnegative integer

`VariableUnitsRow` — Row containing variable units
`0` (default) | nonnegative integer

`VariableDescriptionsRow` — Row containing variable descriptions
`0` (default) | nonnegative integer

`MissingRule` — Procedure to manage missing data
`'fill'` (default) | `'error'` | `'omitrow'` | `'omitvar'`

`EmptyRowRule` — Procedure to handle empty rows
`"skip"` (default) | `"read"` | `"error"`

`ImportErrorRule` — Procedure to handle import errors
`'fill'` (default) | `'error'` | `'omitrow'` | `'omitvar'`

`ExtraColumnsRule` — Procedure to handle extra columns
`'addvars'` | `'ignore'` | `'wrap'` | `'error'`

`MergedCellColumnRule` — Procedure to handle cells with merged columns
`"placeleft"` (default) | `"placeright"` | `"duplicate"` | `"omitrow"` | `"error"`

`MergedCellRowRule` — Procedure to handle cells with merged rows
`"placetop"` (default) | `"placebottom"` | `"duplicate"` | `"omitvar"` | `"error"`