Most appropriate data structure for multi-level nested dataset
3 次查看(过去 30 天)
显示 更早的评论
I have 720 excel files (24 participants x 30 conditions) with 8 sheets (subconditions) each containing 14 variables further divided into 8 cells of double data.
I read them into a nested cell array by using 4 for-loops to the appropriate cell index (see file attached; I had to crop out the last 6 columns in order to upload, the rest of the array is the same though).
However, I now need to analyze the data by aggregating over different levels of this hierachy (e.g. mean of segment 2 over all participants and all conditions). With my data storage I cannot see a way of efficiently performing such an operation when segment is stored on the lowest level.
Therefore I was wondering, how my way of storing the data could be improved or if there is a way to apply the needed functions over different levels of hierachy.
0 个评论
采纳的回答
Jeff Miller
2018-6-26
Maybe your best bet is to use a much simpler table data structure. Each row in the table would correspond to one combination of distinguishable conditions, and certain columns would label the conditions. From your description, you would need label columns for at least the participant, the condition, and the subcondition. There might also be another column or two labelling cell, etc. Other table columns hold the data values for each condition; maybe there are 14 in your case.
Then you can select out arbitrary sets of rows by specifying the relevant values in the label colums, and do what you want with them (e.g., average).
You might be able to use a lot of the RawRT functionality, even though that toolkit was written for a very specific sort of data that is probably not what you have.
3 个评论
Jeff Miller
2018-6-27
> Now I'm just wondering how I can unnest the cell data structure into such a flat format.
If the data are coming from external files, it might actually be easier just to read them in again, creating the appropriate labels for each row.
> In order to perform the analysis, I would probably split it up by variable of interest
There are two advantages to having all the variables in a single table: (a) you only need to create the indicators once, and (b) you have the different variables together in case you want to look at them together (e.g., check for correlations).
There are also some MATLAB built-ins that apply functions to various subsets of tables (e.g., splitapply), but I haven't used them.
更多回答(0 个)
另请参阅
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!