Main Content

h5create

Create HDF5 dataset

Description

h5create(filename,ds,sz) creates a dataset ds, whose name includes its full location, in the HDF5 file filename, and with a size specified by sz.

example

h5create(filename,ds,sz,Name=Value) specifies options using one or more name-value arguments. For example, ChunkSize=[5 5] specifies 5-by-5 chunks of the dataset that can be stored individually in the HDF5 file.

example

Examples

collapse all

Create a fixed-size 100-by-200-by-300 dataset myDataset with full path /g1/g2/myDataset.

h5create("myFile.h5","/g1/g2/myDataset",[100 200 300])

Write data to myDataset. Because the dimensions of myDataset are fixed, the amount of data to be written must match the size of the dataset.

myData = ones(100,200,300);
h5write("myFile.h5","/g1/g2/myDataset",myData)
h5disp("myFile.h5")
HDF5 myFile.h5 
Group '/' 
    Group '/g1' 
        Group '/g1/g2' 
            Dataset 'myDataset' 
                Size:  100x200x300
                MaxSize:  100x200x300
                Datatype:   H5T_IEEE_F64LE (double)
                ChunkSize:  []
                Filters:  none
                FillValue:  0.000000

Create two HDF5 files, each containing a 1000-by-2000 dataset. Use the deflate filter with maximum compression for the first dataset, and use the SZIP filter with entropy encoding for the second. You must specify a chunk size when applying compression filters.

h5create("myFileDeflate.h5","/myDatasetDeflate",[1000 2000], ...
         ChunkSize=[50 80],Deflate=9)
h5create("myFileSZIP.h5","/myDatasetSZIP",[1000 2000], ...
         ChunkSize=[50 80],SZIPEncodingMethod="entropy")

Display the contents of the two files and observe the different filters.

h5disp("myFileDeflate.h5")
HDF5 myFileDeflate.h5 
Group '/' 
    Dataset 'myDatasetDeflate' 
        Size:  1000x2000
        MaxSize:  1000x2000
        Datatype:   H5T_IEEE_F64LE (double)
        ChunkSize:  50x80
        Filters:  deflate(9)
        FillValue:  0.000000
h5disp("myFileSZIP.h5")
HDF5 myFileSZIP.h5 
Group '/' 
    Dataset 'myDatasetSZIP' 
        Size:  1000x2000
        MaxSize:  1000x2000
        Datatype:   H5T_IEEE_F64LE (double)
        ChunkSize:  50x80
        Filters:  szip
        FillValue:  0.000000

Write randomized data to each dataset.

myData = rand([1000 2000]);
h5write("myFileDeflate.h5","/myDatasetDeflate",myData)
h5write("myFileSZIP.h5","/myDatasetSZIP",myData)

Compare the compression filters by examining the sizes of the resulting files. For this data, the deflate filter provides greater compression.

deflateListing = dir("myFileDeflate.h5");
SZIPListing = dir("myFileSZIP.h5");
deflateFileSize = deflateListing.bytes
deflateFileSize = 
15117631
SZIPFileSize = SZIPListing.bytes
SZIPFileSize = 
16027320
sizeRatio = deflateFileSize/SZIPFileSize
sizeRatio = 
0.9432

Create a two-dimensional dataset myDataset3 that is unlimited along the second dimension. You must specify the ChunkSize name-value argument when setting any dimension of the dataset to Inf.

h5create("myFile.h5","/myDataset3",[200 Inf],ChunkSize=[20 20])

Write data to myDataset3. You can write data of any size along the second dimension because this dimension is unlimited. Additionally, because one dimension of the dataset is unlimited, you must specify the start and count arguments when writing data to the dataset.

myData = rand(200,500);
h5write("myFile.h5","/myDataset3",myData,[1 1],[200 500])

Display the entire contents of the HDF5 file.

h5disp("myFile.h5")
HDF5 myFile.h5 
Group '/' 
    Dataset 'myDataset3' 
        Size:  200x500
        MaxSize:  200xInf
        Datatype:   H5T_IEEE_F64LE (double)
        ChunkSize:  20x20
        Filters:  none
        FillValue:  0.000000

Input Arguments

collapse all

Name of the HDF5 file, specified as a string scalar or character vector. If filename does not already exist, then the h5create function creates the file.

Depending on the location to which you are writing, filename can take one of these forms.

Location

Form

Current folder

To write to the current folder, specify the name of the file in filename.

Example: "myFile.h5"

Other folders

To write to a folder different from the current folder, specify the full or relative path name in filename.

Example: "C:\myFolder\myFile.h5"

Example: "/myFolder/myFile.h5"

Remote location

To write to a remote location, specify filename as a uniform resource locator (URL) of the form:

scheme_name://path_to_file/my_file.h5

Based on the remote location, scheme_name can be one of the values in this table.

Remote Locationscheme_name
Amazon S3™s3
Windows Azure® Blob Storagewasb, wasbs

For more information, see Work with Remote Data.

Example: "s3://my_bucket/my_path/my_file.h5"

Dataset name, specified as a string scalar or character vector containing the full pathname of the dataset to be created. If you specify a dataset that does not currently exist, then the h5create function creates the dataset. Additionally, if you specify intermediate groups that do not currently exist, then the h5create function creates those groups.

Example: "/myDataset"

Example: "/g1/g2/myNestedDataset"

Dataset size, specified as a scalar or row vector. To specify an unlimited dimension, specify the corresponding element of sz as Inf. In this case, you must also specify ChunkSize.

Example: 50

Example: [2000 1000]

Example: [100 200 Inf]

Data Types: double

Name-Value Arguments

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Example: h5create("myFile.h5","/dataset1",[1000 2000],ChunkSize=[50 80],CustomFilterID=307,CustomFilterParameters=6) creates the 1000-by-2000 dataset dataset1 in the HDF5 file myFile.h5 using 50-by-80 chunks, the registered bzip2 filter (identifier 307), and a compression block size of 6.

Data type of the dataset, specified as one of these values, representing MATLAB® data types:

  • "double"

  • "single"

  • "uint64"

  • "int64"

  • "uint32"

  • "int32"

  • "uint16"

  • "int16"

  • "uint8"

  • "int8"

  • "string"

Data Types: string | char

Chunk size, specified as a scalar or row vector containing the dimensions of the chunk. If any entry of sz is Inf, then you must specify ChunkSize. The length of ChunkSize must equal the length of sz, and each entry of ChunkSize must be less than or equal to the corresponding entry of sz.

Example: 10

Example: [20 10 100]

Data Types: double

Deflate compression level, specified as an integer scalar value from 0 to 9. The default value of 0 indicates no compression. A value of 1 indicates the least compression, and a value of 9 indicates the most. If you specify Deflate, you must also specify ChunkSize.

You cannot specify both Deflate and SZIPEncodingMethod in the same function call.

Data Types: double

Fill value for missing data in numeric datasets, specified as a numeric value.

Data Types: double | single | uint8 | uint16 | uint32 | uint64 | int8 | int16 | int32 | int64

32-bit Fletcher checksum filter, specified as a numeric or logical 1 (true) or 0 (false). A Fletcher checksum filter verifies that the transferred data in a file is error-free. If you specify Fletcher32, you must also specify ChunkSize.

Data Types: logical | double

Shuffle filter, specified as a numeric or logical 1 (true) or 0 (false). A shuffle filter improves the compression ratio by rearranging the byte order of data stored in memory. If you specify Shuffle, you must also specify ChunkSize.

Data Types: logical | double

Text encoding, specified as one of these values:

  • "UTF-8" — Represent characters using UTF-8 encoding.

  • "system" — Represent characters as bytes using the system encoding (not recommended).

Data Types: string | char

Filter identifier for the registered filter plugin assigned by The HDF Group, specified as a positive integer. For a list of registered filters, see the Filters page on The HDF Group website.

If you do not specify a value for CustomFilterID, then the dataset does not use dynamically loaded filters for compression.

If you specify CustomFilterID, you must also specify ChunkSize.

Data Types: double | single | uint8 | uint16 | uint32 | uint64 | int8 | int16 | int32 | int64

Filter parameters for third-party filters, specified as a numeric scalar or numeric row vector. If you specify CustomFilterID without also specifying this argument, then the h5create function passes an empty vector to the HDF5 library and the filter uses default parameters.

This name-value argument corresponds to the cd_values argument of the H5Pset_filter function in the HDF5 library.

If you specify CustomFilterParameters, you must also specify CustomFilterID.

Data Types: double | single | uint8 | uint16 | uint32 | uint64 | int8 | int16 | int32 | int64

Since R2024b

Encoding method for SZIP compression, specified as "entropy" or "nearestneighbor". The entropy method is best suited for data that has already been processed; the nearestneighbor method preprocesses the data and then applies the entropy method. If you specify SZIPEncodingMethod, you must also specify ChunkSize.

You cannot specify both SZIPEncodingMethod and Deflate in the same function call.

Data Types: string | char

Since R2024b

Number of pixels (HDF5 data elements) per block for SZIP compression, specified as an even integer from 2 to 32. If you specify SZIPPixelsPerBlock, you must also specify SZIPEncodingMethod. The value of SZIPPixelsPerBlock must be less than or equal to the number of elements in each dataset chunk.

Example: 32

Data Types: double | single | uint8 | uint16 | uint32 | uint64 | int8 | int16 | int32 | int64

More About

collapse all

Chunk Storage in HDF5

Chunk storage refers to a method of storing a dataset in memory by dividing it into smaller pieces of data known as chunks. Chunking a dataset can improve performance when operating on a subset of the dataset, since the chunks can be read and written to the HDF5 file individually.

Tips

Version History

Introduced in R2011a

expand all