partition

Class: matlab.io.datastore.Partitionable
Namespace: matlab.io.datastore

Partition a datastore

Syntax

subds = partition(ds,n,index)

Description

subds = partition(ds,n,index) partitions datastore ds into the number of parts specified by n and returns the partition corresponding to the index index. The partitioned datastore subds is of the same type as the input datastore ds.

example

Input Arguments

expand all

`ds` — Input datastore
`matlab.io.Datastore` object

Input datastore, specified as a matlab.io.Datastore object. To create a Datastore object, see matlab.io.Datastore.

`n` — Number of partitions
positive integer

Number of partitions, specified as a positive integer. To get a reasonable value for n, use the numpartitions function.

When you specify a value of n that is not in the range of partitions available for the datastore, the partition method returns an empty datastore. For more information, see Empty Datastores. For instance, if a datastore can hold up to 10 partitions, then the output of the partition method depends on the value of n.

If the specified value of n is less than or equal to 10, then the partition method returns the partition specified by the index. For example, partition(ds,10,1) returns a copy of the first partition of the original datastore ds.
If the specified value of n is greater than 10, then the partition method returns an empty datastore. For example, partition(ds,100,11) returns an empty datastore.

Example: 3

Data Types: double

`index` — Index
positive integer

Index, specified as a positive integer.

Example: 1

Data Types: double

Attributes

Abstract true

To learn about attributes of methods, see Method Attributes.

Examples

expand all

Build Datastore with Parallel Processing Support

Open Script

Build a datastore with parallel processing support and use it to bring your custom or proprietary data into MATLAB®. Then, process the data in a parallel pool.

Create a .m class definition file that contains the code implementing your custom datastore. You must save this file in your working folder or in a folder that is on the MATLAB® path. The name of the .m file must be the same as the name of your object constructor function. For example, if you want your constructor function to have the name MyDatastorePar, then the name of the .m file must be MyDatastorePar.m. The .m class definition file must contain the following steps:

Step 1: Inherit from the datastore classes.
Step 2: Define the constructor and the required methods.
Step 3: Define your custom file reading function.

In addition to these steps, define any other properties or methods that you need to process and analyze your data.

%% STEP 1: INHERIT FROM DATASTORE CLASSES
classdef MyDatastorePar < matlab.io.Datastore & ...
        matlab.io.datastore.Partitionable
   
    properties(Access = private)
        CurrentFileIndex double
        FileSet matlab.io.datastore.DsFileSet
    end
    
    % Property to support saving, loading, and processing of
    % datastore on different file system machines or clusters.
    % In addition, define the methods get.AlternateFileSystemRoots()
    % and set.AlternateFileSystemRoots() in the methods section. 
    properties(Dependent)
        AlternateFileSystemRoots
    end
    
%% STEP 2: DEFINE THE CONSTRUCTOR AND THE REQUIRED METHODS
    methods
        % Define your datastore constructor
        function myds = MyDatastorePar(location,altRoots)
            myds.FileSet = matlab.io.datastore.DsFileSet(location,...
                'FileExtensions','.bin', ...
                'FileSplitSize',8*1024);
            myds.CurrentFileIndex = 1;
             
            if nargin == 2
                 myds.AlternateFileSystemRoots = altRoots;
            end
            
            reset(myds);
        end
        
        % Define the hasdata method
        function tf = hasdata(myds)
            % Return true if more data is available
            tf = hasfile(myds.FileSet);
        end
        
        % Define the read method
        function [data,info] = read(myds)
            % Read data and information about the extracted data
            % See also: MyFileReader()
            if ~hasdata(myds)
                msgII = ['Use the reset method to reset the datastore ',... 
                         'to the start of the data.']; 
                msgIII = ['Before calling the read method, ',...
                          'check if data is available to read ',...
                          'by using the hasdata method.'];
                error('No more data to read.\n%s\n%s',msgII,msgIII);
            end
            
            fileInfoTbl = nextfile(myds.FileSet);
            data = MyFileReader(fileInfoTbl);
            info.Size = size(data);
            info.FileName = fileInfoTbl.FileName;
            info.Offset = fileInfoTbl.Offset;
            
            % Update CurrentFileIndex for tracking progress
            if fileInfoTbl.Offset + fileInfoTbl.SplitSize >= ...
                    fileInfoTbl.FileSize
                myds.CurrentFileIndex = myds.CurrentFileIndex + 1 ;
            end
        end
        
        % Define the reset method
        function reset(myds)
            % Reset to the start of the data
            reset(myds.FileSet);
            myds.CurrentFileIndex = 1;
        end

        % Define the partition method
        function subds = partition(myds,n,ii)
            subds = copy(myds);
            subds.FileSet = partition(myds.FileSet,n,ii);
            reset(subds);
        end
        
        % Getter for AlternateFileSystemRoots property
        function altRoots = get.AlternateFileSystemRoots(myds)
            altRoots = myds.FileSet.AlternateFileSystemRoots;
        end

        % Setter for AlternateFileSystemRoots property
        function set.AlternateFileSystemRoots(myds,altRoots)
            try
              % The DsFileSet object manages AlternateFileSystemRoots
              % for your datastore
              myds.FileSet.AlternateFileSystemRoots = altRoots;

              % Reset the datastore
              reset(myds);  
            catch ME
              throw(ME);
            end
        end
      
    end
    
    methods (Hidden = true)          
        % Define the progress method
        function frac = progress(myds)
            % Determine percentage of data read from datastore
            if hasdata(myds) 
               frac = (myds.CurrentFileIndex-1)/...
                             myds.FileSet.NumFiles; 
            else 
               frac = 1;  
            end 
        end
    end
    
    methods(Access = protected)
        % If you use the  FileSet property in the datastore,
        % then you must define the copyElement method. The
        % copyElement method allows methods such as readall
        % and preview to remain stateless 
        function dscopy = copyElement(ds)
            dscopy = copyElement@matlab.mixin.Copyable(ds);
            dscopy.FileSet = copy(ds.FileSet);
        end
        
        % Define the maxpartitions method
        function n = maxpartitions(myds)
            n = maxpartitions(myds.FileSet);
        end
    end
end

%% STEP 3: IMPLEMENT YOUR CUSTOM FILE READING FUNCTION
function data = MyFileReader(fileInfoTbl)
% create a reader object using FileName
reader = matlab.io.datastore.DsFileReader(fileInfoTbl.FileName);

% seek to the offset
seek(reader,fileInfoTbl.Offset,'Origin','start-of-file');

% read fileInfoTbl.SplitSize amount of data
data = read(reader,fileInfoTbl.SplitSize);

end

Your custom datastore is now ready. Use your custom datastore to read and process the data in a parallel pool.

More About

expand all

Empty Datastores

An empty datastore is a datastore object that does not contain any records. For an empty datastore, your custom datastore methods must satisfy these conditions:

hasdata must return false.
read must return an error.
numpartitions and maxpartitions must return 0.
partition must return an empty datastore.
preview and readall must return empty data that preserves the non-tall dimensions. For example, if the read method on a nonempty datastore returns data that is of size 5-by-15-by-25, then the preview and readall methods must return empty data of size 0-by-15-by-25.

Non-Tall Dimensions

Dimensions other than the first dimension of the array. For an array of size 5-by-15-by-25, the tall dimension is 5 and the non-tall dimensions are 15 and 25.

Tips

In your implementation of the partition method, you must include these steps.
- Before creating a partitioned datastore subds, create a deep copy of the original datastore ds.
- At the end of the partition method, reset the partitioned datastore subds.
For a sample implementation of the partition method, see Add Support for Parallel Processing.
When a partition of a datastore contains no readable record, the read method must return empty data. The non-tall dimensions of this empty data must match the non-tall dimensions of the read method output on a partition with readable records. This requirement ensures that the behavior of the readall method matches the behavior of the gather function.

Version History

Introduced in R2017b

partition

Syntax

Description

Input Arguments

ds — Input datastore matlab.io.Datastore object

n — Number of partitions positive integer

index — Index positive integer

Attributes

Examples

Build Datastore with Parallel Processing Support

More About

Empty Datastores

Non-Tall Dimensions

Tips

Version History

See Also

`ds` — Input datastore
`matlab.io.Datastore` object

`n` — Number of partitions
positive integer

`index` — Index
positive integer