Import HDF5 Files
Overview
Hierarchical Data Format, Version 5, (HDF5) is a
general-purpose, machine-independent standard for storing scientific data in files, developed by
the National Center for Supercomputing Applications (NCSA). HDF5 is used by a wide range of
engineering and scientific fields that want a standard way to store data so that it can be
shared. For more information about the HDF5 file format, read the HDF5 documentation available
at The HDF Group website (https://www.hdfgroup.org
).
MATLAB® provides two methods to import data from an HDF5 file:
High-level functions that make it easy to import data, when working with numeric data sets
Low-level functions that enable more complete control over the importing process, by providing access to the routines in the HDF5 C library
Note
For information about importing to HDF4 files, which have a separate, incompatible format, see Import HDF4 Files Programmatically.
Import Data Using High-Level HDF5 Functions
MATLAB includes several functions that you can use to examine the contents of an HDF5 file and import data from the file into the MATLAB workspace.
Note
You can use only the high-level functions to read numeric data sets or attributes. To read non-numeric data sets or attributes, you must use the low-level interface.
h5disp
— View the contents of an HDF5 file.h5info
— Create a structure that contains all the metadata defining an HDF5 file.h5read
— Read data from a variable in an HDF5 file.h5readatt
— Read data from an attribute associated with a variable in an HDF5 file or with the file itself (a global attribute).
For details about how to use these functions, see their reference pages, which include examples. The following sections illustrate some common usage scenarios.
Determine Contents of HDF5 File
HDF5 files can contain data and metadata, called attributes. HDF5 files organize the data and metadata in a hierarchical structure similar to the hierarchical structure of a UNIX® file system.
In an HDF5 file, the directories in the hierarchy are called groups. A group can contain other groups, data sets, attributes, links, and data types. A data set is a collection of data, such as a multidimensional numeric array or string. An attribute is any data that is associated with another entity, such as a data set. A link is similar to a UNIX file system symbolic link. Links are a way to reference objects without having to make a copy of the object.
Data types are a description of the data in the data set or attribute. Data types tell how to interpret the data in the data set.
To get a quick view into the contents of an HDF5 file, use the
h5disp
function.
h5disp('example.h5') HDF5 example.h5 Group '/' Attributes: 'attr1': 97 98 99 100 101 102 103 104 105 0 'attr2': 2x2 H5T_INTEGER Group '/g1' Group '/g1/g1.1' Dataset 'dset1.1.1' Size: 10x10 MaxSize: 10x10 Datatype: H5T_STD_I32BE (int32) ChunkSize: [] Filters: none Attributes: 'attr1': 49 115 116 32 97 116 116 114 105 ... 'attr2': 50 110 100 32 97 116 116 114 105 ... Dataset 'dset1.1.2' Size: 20 MaxSize: 20 Datatype: H5T_STD_I32BE (int32) ChunkSize: [] Filters: none Group '/g1/g1.2' Group '/g1/g1.2/g1.2.1' Link 'slink' Type: soft link Group '/g2' Dataset 'dset2.1' Size: 10 MaxSize: 10 Datatype: H5T_IEEE_F32BE (single) ChunkSize: [] Filters: none Dataset 'dset2.2' Size: 5x3 MaxSize: 5x3 Datatype: H5T_IEEE_F32BE (single) ChunkSize: [] Filters: none . . .
To explore the hierarchical organization of an HDF5 file, use the
h5info
function. h5info
returns a
structure that contains various information about the HDF5 file, including the
name of the file.
info = h5info('example.h5') info = Filename: 'matlabroot\matlab\toolbox\matlab\demos\example.h5' Name: '/' Groups: [4x1 struct] Datasets: [] Datatypes: [] Links: [] Attributes: [2x1 struct]
By looking at the Groups
and Attributes
fields, you can see that the file contains four groups and two attributes. The
Datasets
, Datatypes
, and
Links
fields are all empty, indicating that the root
group does not contain any data sets, data types, or links. To explore the
contents of the sample HDF5 file further, examine one of the structures in
Groups
. The following example shows the contents of the
second structure in this field.
level2 = info.Groups(2) level2 = Name: '/g2' Groups: [] Datasets: [2x1 struct] Datatypes: [] Links: [] Attributes: []
In the sample file, the group named /g2
contains two data
sets. The following figure illustrates this part of the sample HDF5 file
organization.
To get information about a data set, such as its name, dimensions, and data
type, look at either of the structures returned in the
Datasets
field.
dataset1 = level2.Datasets(1) dataset1 = Filename: 'matlabroot\example.h5' Name: '/g2/dset2.1' Rank: 1 Datatype: [1x1 struct] Dims: 10 MaxDims: 10 Layout: 'contiguous' Attributes: [] Links: [] Chunksize: [] Fillvalue: []
Import Data from HDF5 File
To read data or metadata from an HDF5 file, use the h5read
function. As arguments, specify the name of the HDF5 file and the name of the
data set. (To read the value of an attribute, you must use
h5readatt
.)
To illustrate, this example reads the data set, /g2/dset2.1
from the HDF5 sample file example.h5
.
data = h5read('example.h5','/g2/dset2.1') data = 1.0000 1.1000 1.2000 1.3000 1.4000 1.5000 1.6000 1.7000 1.8000 1.9000
Map HDF5 Data Types to MATLAB Data Types
When the h5read
function reads data from an HDF5 file into
the MATLAB workspace, it maps HDF5 data types to MATLAB data types, as shown in the table below.
HDF5 Data Type | h5read Returns |
---|---|
Bit-field | Array of packed 8-bit integers |
Float | MATLAB single and double types, provided that they occupy 64 bits or fewer |
Integer types, signed and unsigned | Equivalent MATLAB integer types, signed and unsigned |
Opaque | Array of uint8 values |
Reference | Returns the actual data pointed to by the reference, not the value of the reference. |
Strings, fixed-length and variable length | String arrays. |
Enums | Cell array of character vectors, where each enumerated value is replaced by the corresponding member name |
Compound | 1-by-1 struct array; the dimensions of the data set are expressed in the fields of the structure. |
Arrays | Array of values using the same data type as the HDF5 array.
For example, if the array is of signed 32-bit integers, the
MATLAB array will be of type int32 .
|
The example HDF5 file included with MATLAB includes examples of all these data types.
For example, the data set /g3/string
is a string.
h5disp('example.h5','/g3/string') HDF5 example.h5 Dataset 'string' Size: 2 MaxSize: 2 Datatype: H5T_STRING String Length: 3 Padding: H5T_STR_NULLTERM Character Set: H5T_CSET_ASCII Character Type: H5T_C_S1 ChunkSize: [] Filters: none FillValue: ''
Now read the data from the file, MATLAB returns it as a cell array of character vectors.
s = h5read('example.h5','/g3/string') s = 'ab ' 'de ' >> whos s Name Size Bytes Class Attributes s 2x1 236 cell
The compound data types are always returned as a 1-by-1 struct. The dimensions
of the data set are expressed in the fields of the struct. For example, the data
set /g3/compound2D
is a compound data type.
h5disp('example.h5','/g3/compound2D') HDF5 example.h5 Dataset 'compound2D' Size: 2x3 MaxSize: 2x3 Datatype: H5T_COMPOUND Member 'a': H5T_STD_I8LE (int8) Member 'b': H5T_IEEE_F64LE (double) ChunkSize: [] Filters: none FillValue: H5T_COMPOUND
Now read the data from the file, MATLAB returns it as a 1-by-1 struct.
data = h5read('example.h5','/g3/compound2D') data = a: [2x3 int8] b: [2x3 double]
Import Data Using Low-Level HDF5 Functions
MATLAB provides direct access to dozens of functions in the HDF5 library with low-level functions that correspond to the functions in the HDF5 library. In this way, you can access the features of the HDF5 library from MATLAB, such as reading and writing complex data types and using the HDF5 subsetting capabilities. For more information, see Export Data Using MATLAB Low-Level HDF5 Functions.
Read HDF5 Data Set Using Dynamically Loaded Filters
MATLAB supports reading and writing HDF5 data sets using dynamically loaded filters. The HDF Group maintains a list of registered filters at Filters on their website.
To read a data set that has been written using a user-defined, third-party filter, follow these steps:
Install the HDF5 filter plugin on your system as a shared library or DLL.
Set the
HDF5_PLUGIN_PATH
environment variable to the folder containing the installed plugin binary file. On a Windows® system, use thesetenv
command in MATLAB. On a Linux® or Mac system, perform this action in a terminal window before you start MATLAB.
After you complete these steps, you can use the high-level or low-level MATLAB HDF5 functions to read and access data sets that have been compressed using the third-party filter. For more information, see HDF5 Dynamically Loaded Filters on The HDF Group website.
Linux Users Only: Rebuild Filter Plugins Using MATLAB HDF5 Shared Library
Starting in R2021b, in certain cases, Linux users using a filter plugin with callbacks to core HDF5 library
functions must rebuild the plugin using the shipping MATLAB HDF5 shared library,
/matlab/bin/glnxa64/libhdf5.so.x.x.x
. If you do not
rebuild the plugin using this version of the shared library, you might
experience issues ranging from undefined behavior to crashes. For more
information, see Build HDF5 Filter Plugins on Linux Using MATLAB HDF5 Shared Library or GNU Export Map.