Large Files and Big Data
Large data sets can be in the form of large files that do not fit into available memory or files that take a long time to process. A large data set also can be a collection of numerous small files. There is no single approach to working with large data sets, so MATLAB® includes a number of tools for accessing and processing large data.
Begin by creating a datastore that can access small portions of the data at a
time. You can use the datastore to manage incremental import of the data. To
analyze the data using common MATLAB functions, such as mean
and
histogram
, create a tall array on top of the datastore.
For more complex problems, you can write a MapReduce algorithm that defines the
chunking and reduction of the data.
Categories
- Datastore
Read large collections of data
- Tall Arrays
Arrays with more rows than fit in memory
- MapReduce
Programming technique for analyzing data sets that do not fit in memory
- Large MAT-Files
Access and change variables without loading into memory
- Parquet Files
Read and write Parquet files
- Memory Mapping
Map file data to memory for faster access