Most appropriate data structure for multi-level nested dataset

Question

0 个投票

finalcell_cropped.mat

I have 720 excel files (24 participants x 30 conditions) with 8 sheets (subconditions) each containing 14 variables further divided into 8 cells of double data.

I read them into a nested cell array by using 4 for-loops to the appropriate cell index (see file attached; I had to crop out the last 6 columns in order to upload, the rest of the array is the same though).

However, I now need to analyze the data by aggregating over different levels of this hierachy (e.g. mean of segment 2 over all participants and all conditions). With my data storage I cannot see a way of efficiently performing such an operation when segment is stored on the lowest level.

Therefore I was wondering, how my way of storing the data could be improved or if there is a way to apply the needed functions over different levels of hierachy.

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

请先登录，再回答此问题。

Follow Question

Answer 1

Jeff Miller 2018-6-26

1 个投票

Maybe your best bet is to use a much simpler table data structure. Each row in the table would correspond to one combination of distinguishable conditions, and certain columns would label the conditions. From your description, you would need label columns for at least the participant, the condition, and the subcondition. There might also be another column or two labelling cell, etc. Other table columns hold the data values for each condition; maybe there are 14 in your case.

Then you can select out arbitrary sets of rows by specifying the relevant values in the label colums, and do what you want with them (e.g., average).

You might be able to use a lot of the RawRT functionality, even though that toolkit was written for a very specific sort of data that is probably not what you have.

3 个评论
显示 1更早的评论隐藏 1更早的评论

Jeff Miller 2018-6-27

> Now I'm just wondering how I can unnest the cell data structure into such a flat format.

If the data are coming from external files, it might actually be easier just to read them in again, creating the appropriate labels for each row.

> In order to perform the analysis, I would probably split it up by variable of interest

There are two advantages to having all the variables in a single table: (a) you only need to create the indicators once, and (b) you have the different variables together in case you want to look at them together (e.g., check for correlations).

There are also some MATLAB built-ins that apply functions to various subsets of tables (e.g., splitapply), but I haven't used them.

Dom Janetzko 2018-6-27

> If the data are coming from external files, it might actually be easier just to read them in again, creating the appropriate labels for each row.

They came from 720 external excel files which I didn't want to read in again. Therefore I was looking for a solution to rearrange the data in the cell array. I actually did find a way using nested for loops (again) writing to the respective cells. Now my data is nicely ordered.

> There are two advantages to having all the variables in a single table: (a) you only need to create the indicators once, and (b) you have the different variables together in case you want to look at them together (e.g., check for correlations).

You were right! I stayed with all the variables in one table.

I was actually also looking for a way to transfer the data from MATLAB to R. With your approach I could just write the table to csv and import it. Your data structure also lead to a neat little way to structure data in R called "tidy data". My analysis now works perfectly. Thanks once again!

请先登录，再进行评论。

Most appropriate data structure for multi-level nested dataset

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

采纳的回答

3 个评论
显示 1更早的评论隐藏 1更早的评论

更多回答（0 个）

类别

标签

Community Treasure Hunt

Most appropriate data structure for multi-level nested dataset

0 个评论 显示 -2更早的评论 隐藏 -2更早的评论

采纳的回答

3 个评论 显示 1更早的评论 隐藏 1更早的评论

更多回答（0 个）

类别

标签

另请参阅

Community Treasure Hunt

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

3 个评论
显示 1更早的评论隐藏 1更早的评论