matlab.io.datastore.FileSet
Description
The matlab.io.datastore.FileSet
object helps you process a
large collection of files when moving through the files iteratively. Use the
FileSet
object together with the DsFileReader
object
to manage and read files from your datastore.
Creation
Syntax
Description
creates a fs
= matlab.io.datastore.FileSet(location
)FileSet
object for a collection of files based on the
specified location.
specifies the file extension, whether to include subfolders, or sets object properties.
You can specify multiple name-value pairs. Enclose names in quotes.fs
= matlab.io.datastore.FileSet(location
,Name,Value
)
Input Arguments
location
— Files or folders to include
character vector | cell array of character vectors | string array | structure
Files or folders to include in the FileSet
object, specified as
a character vector, cell array of character vectors, string array, or a structure. If
the files are not in the current folder, then location
must be a
full or relative path. Files within subfolders of the specified folder are not
automatically included in the FileSet
object.
Typically for a Hadoop® workflow, when you specify location
as a
structure, it must contain the fields FileName
,
Offset
, and Size
. This requirement enables you
to use the location
argument directly with the initializeDatastore
method of the matlab.io.datastore.HadoopLocationBased
class. For an example, see Add Support for Hadoop.
You can use the wildcard character (*) when specifying
location
. Specifying this character includes all matching files or
all files in the matching folders in the file-set object.
If the files are not available locally, then the full path of the files or folders
must be a uniform resource locator (URL), such
as
hdfs://
.hostname
:portnumber
/path_to_file
Data Types: char
| cell
| string
| struct
Specify optional pairs of arguments as
Name1=Value1,...,NameN=ValueN
, where Name
is
the argument name and Value
is the corresponding value.
Name-value arguments must appear after other arguments, but the order of the
pairs does not matter.
Before R2021a, use commas to separate each name and value, and enclose
Name
in quotes.
Example: fs =
matlab.io.datastore.FileSet(location,'IncludeSubfolders',true)
IncludeSubfolders
— Subfolder inclusion flag
0
or false
(default) | 1
or true
Subfolder inclusion flag, specified as a numeric or logical 1
(true
) or 0
(false
).
Specify true
to include all files and subfolders within each
folder or false
to include only the files within each
folder.
Example: 'IncludeSubfolders',true
FileExtensions
— File extensions
character vector | cell array of character vectors | string array
File extensions, specified as a character vector, cell array of character
vectors, or string array. You can use the empty quotes ''
to
represent files without extensions.
If 'FileExtensions'
is not specified, then
BlockedFileSet
automatically includes all file
extensions.
Example: 'FileExtensions','.jpg'
Example: 'FileExtensions',{'.txt','.csv'}
Properties
AlternateFileSystemRoots
— Alternate file system root paths
string array | cell array
Alternate file system root paths, specified as a string array or a cell array. Use
'AlternateFileSystemRoots'
when you create a datastore on a local
machine, but need to access and process the data on another machine (possibly of a
different operating system). Also, when processing data using the Parallel Computing Toolbox™ and the MATLAB®
Parallel Server™, and the data is stored on your local machines with a copy of the data
available on different platform cloud or cluster machines, you must use
'AlternateFileSystemRoots'
to associate the root paths.
To associate a set of root paths that are equivalent to one another, specify
'AlternateFileSystemRoots'
as a string array. For example,["Z:\datasets","/mynetwork/datasets"]
To associate multiple sets of root paths that are equivalent for the datastore, specify
'AlternateFileSystemRoots'
as a cell array containing multiple rows where each row represents a set of equivalent root paths. Specify each row in the cell array as either a string array or a cell array of character vectors. For example:Specify
'AlternateFileSystemRoots'
as a cell array of string arrays.{["Z:\datasets", "/mynetwork/datasets"];... ["Y:\datasets", "/mynetwork2/datasets","S:\datasets"]}
Alternatively, specify
'AlternateFileSystemRoots'
as a cell array of cell array of character vectors.{{'Z:\datasets','/mynetwork/datasets'};... {'Y:\datasets', '/mynetwork2/datasets','S:\datasets'}}
The value of 'AlternateFileSystemRoots'
must satisfy these
conditions:
Contains one or more rows, where each row specifies a set of equivalent root paths.
Each row specifies multiple root paths and each root path must contain at least two characters.
Root paths are unique and are not subfolders of one another.
Contains at least one root path entry that points to the location of the files.
For more information, see Set Up Datastore for Processing on Different Machines or Clusters.
Example: ["Z:\datasets","/mynetwork/datasets"]
Data Types: string
| cell
NumFiles
— Number of files
numeric scalar
This property is read-only.
Number of files in the file-set object, specified as a numeric scalar.
Example: fs.NumFiles
Data Types: double
NumFilesRead
— Number of files read
numeric scalar
This property is read-only.
Number of files read from the FileSet
object, specified as a
numeric scalar.
Example: fs.NumFilesRead
Data Types: double
FileInfo
— Information about files
matlab.io.datastore.FileInfo
object
This property is read-only.
Information about files in the matlab.io.datastore.FileSet
object, returned as a matlab.io.datastore.FileInfo
object with these properties:
Filename
— Name of the file in theFileSet
object. The name contains the full path of the file.FileSize
— Size of the file in number of bytes.
For information about a specific file, specify the file index. For example,
fs.FileInfo(2)
returns the file name and file size for the second
file. If you call fs.FileInfo
specifying (:)
or
without specifying an index, it returns information for all of the files.
Example: fs.FileInfo(2)
Object Functions
hasNextFile | Determine if file-set has another file in file-set |
nextfile | Information on next file or file chunk |
hasPreviousFile | Determine if a file-set has a previous file |
previousfile | Information on previous file in file-set |
progress | Determine how many blocks or files have been read |
maxpartitions | Maximum number of partitions |
partition | Partition file-set object |
subset | Create subset of datastore or FileSet |
reset | Reset the file-set object |
Examples
Create a File-Set and Get Information on All Files
Create a file-set and query information for specific files in the file-set.
Create a file-set fs
for a collection of files.
folder = {'accidents.mat','airlineResults.mat','census.mat','earth.mat'}
folder = 1x4 cell
{'accidents.mat'} {'airlineResults.mat'} {'census.mat'} {'earth.mat'}
fs = matlab.io.datastore.FileSet(folder)
fs = FileSet with properties: NumFiles: 4 NumFilesRead: 0 FileInfo: FileInfo for all 4 files AlternateFileSystemRoots: {}
Obtain information for specific files using either the nextfile
function or by querying the FileInfo
property and specifying an index. Obtain information for consecutive files using nextfile
. For example, obtain information for the first two files in the set.
file1 = nextfile(fs)
file1 = 1x1 FileInfo Filename FileSize _________________________________________________________________________________________________________________ ________ "/mathworks/devel/bat/filer/batfs2566-0/Bdoc24b.2725827/build/runnable/matlab/toolbox/matlab/demos/accidents.mat" 7343
file2 = nextfile(fs)
file2 = 1x1 FileInfo Filename FileSize ______________________________________________________________________________ __________ "/tmp/Bdoc24b_2725827_1865375/tp2552d238/matlab-ex98758341/airlineResults.mat" 1.5042e+05
Query the FileInfo
property to get information about the last file in the set.
lastfile = fs.FileInfo(4)
lastfile = 1x1 FileInfo Filename FileSize _____________________________________________________________________________________________________________ ________ "/mathworks/devel/bat/filer/batfs2566-0/Bdoc24b.2725827/build/runnable/matlab/toolbox/matlab/demos/earth.mat" 32522
Version History
Introduced in R2020a
MATLAB Command
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)
Asia Pacific
- Australia (English)
- India (English)
- New Zealand (English)
- 中国
- 日本Japanese (日本語)
- 한국Korean (한국어)