What is the most efficient way to store a data set (Made up of multiple txt files).

36 次查看(过去 30 天)
I have to import and store txt files containing about 150 runs of data. Each data set is a fixed width txt file, with header lines describing each channel on row 6.
Each txt file has about 80 'channels' of data which I need to apply various processing techniques such as filtering, fft etc. The data is 100Hz and usually 3-10mins.
So far I have been able to store each run file as a Table with a shortened file name which I can use to index, and each channel name has been added as a variable name for the table, appearing at the top of each column (handy).
This method is functional, but I'm not sure if it's optimal for working with large sets of signal data. The low pass filtering, or any method that requires re-writing large sets of modified data seems particularly slow. Matlab also requires converting from table to an array for functions like low pass, which I am guessing is adding to the processing time.
So, considering my code below, is there a more efficienct way of storing this data (i.e. structural arrays, etc?) or is my current approach ok?
(I know the way I have done the fileName assignment is a bit strange given you can do it automatically but just ignore that for now)
opts = detectImportOptions('XXX Slow Channels_Run_198.txt');
opts.VariableNamesLine = 6;
for j=[51:55 57:61 63:95 97:123 125:139 141:145 147:214]
fileName = sprintf('XXX Slow Channels_Run_%d.txt',j);
fileNameShort = sprintf('Run%d',j);
runFiles.rawDataTable.([fileNameShort]) = readtable(fileName,opts);
end

采纳的回答

Walter Roberson
Walter Roberson 2019-12-4
It turns out that MATLAB stores each variable of a table in a cell array entry, so the storage requirements are
  • overhead for storing the properties of the table such as variable names
  • one cell row vector with one entry per variable
Thus the storage requirements are not much more than if you had used a cell array yourself.
Also, the process of extracting an individual variable from a table, which can look like
tablename{:,variable_number}
ends up being equivalent to
internal_cell{1,variable_number}
which is a direct access to the storage without any inherent copying needed. Any slowness you would see in access would be in the overhead of going through the object subsref method to figure out what was needing to be done. In the case of accessing by variable number (instead of variable name) it is effectively constant overhead. That does imply that the overhead is proportionally higher for tables with smaller number of rows, and should not amount to much proportionally for large number of rows.
struct arrays are potentially slightly less overhead in accessing an individual variable -- although the theoretical overhead for accessing an entry by name is more than the theoretical overhead for accessing a cell array by column number, struct arrays have been around a long time and are well optimized in internal code.

更多回答(0 个)

类别

Help CenterFile Exchange 中查找有关 Logical 的更多信息

产品


版本

R2019a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by