How to create separate tables from a data set

5 次查看(过去 30 天)
Working with gait analysis data, the exported raw file separates the data into four "tables" on top of each other (Joints, Model outputs, Segments and Trajectories).
When I use the readtable function, it identifies the variables from the top table (Joints) but not the other tables and instead has 5 rows of NaN separating the tables.
How do I get my code to identify and separate the data into easier to read tables.
(Attached is the file as a .txt, the file was too large to attach as .csv which is what I've been working with)
  3 个评论
Jak
Jak 2021-2-10
Yeh you're right, that is much simpler. Thanks.
You can find it attached here
Image Analyst
Image Analyst 2021-2-10
Please explain what column numbers of that are supposed to be what table. And do you want an array of tables (one table for each patient) or do you want all patients in the same table with the patient ID as one of the columns (so there is just a single set of 4 tables)?

请先登录,再进行评论。

采纳的回答

dpb
dpb 2021-2-10
编辑:dpb 2021-2-11
Boy! is that a mess! Why people/vendors do such things is beyond ken.
Another start...
G=readcell('Level Reduced.csv'); % bring in whole thing as cell array
SECTIONS={'Joints','Model','Segments','Trajectories'}; % the looked for data sections
ich=find(cellfun(@ischar,G(:,1))); % the records that have char() data first column
isec=ich(contains(G(ich,1),SECTIONS)); % the locations of each section beginning
isec=[isec;size(G,1)+2]; % add last line for indexing sections first:last lines
for i=1:numel(SECTIONS)
vn=string(G(isec(i)+3,:));
in=find(ismissing(vn));
vn(in)="Var"+in;
cmd=sprintf('t%s=cell2table(G(isec(%d)+5:isec(%d)-2,:))',SECTIONS{i},i,i+1);
eval(cmd)
end
The above results in four tables (I cut the size of the sample file down even further to just three records per group) named per the SECTIONS array; normally I would never use eval to "poof" variables into the workplace, but this seems to be the unusual case where it does make some sense. Code from here on would need to (and would presume would want to) use the explicit table names as it appears each group is different-enough as that code would not be duplicated.
tJoints =
3×25 table
Var1 Var2 Var3 Var4 Var5 Var6 Var7 Var8 Var9 Var10 Var11 Var12 Var13 Var14 Var15 Var16 Var17 Var18 Var19 Var20 Var21 Var22 Var23 Var24 Var25
______ ____ _____ _____________ ______ _____ ______ _____ _____________ _____ ______ _____ ______ ______ _____ _____ _____ _____ _____ ______ _____ _____ ______ ______ _______
123.00 0.00 41.07 [1×1 missing] -23.61 -0.50 -52.06 11.76 [1×1 missing] 33.19 -23.97 26.34 -24.32 -19.67 0.78 -9.17 -4.21 3.50 23.78 -70.89 19.20 48.85 897.59 211.86 NaN
124.00 0.00 40.01 [1×1 missing] -23.56 -0.31 -51.82 11.77 [1×1 missing] 33.21 -23.97 26.64 -24.50 -19.55 0.99 -9.05 -4.14 3.77 23.71 -70.82 18.92 49.97 896.71 211.33 -106.03
125.00 0.00 38.65 [1×1 missing] -23.43 -0.10 -51.42 11.78 [1×1 missing] 33.20 -23.99 26.82 -24.56 -19.54 1.27 -8.98 -4.06 4.09 23.83 -70.81 18.81 50.99 896.06 210.57 -136.10
tModel =
3×25 table
Var1 Var2 Var3 Var4 Var5 Var6 Var7 Var8 Var9 Var10 Var11 Var12 Var13 Var14 Var15 Var16 Var17 Var18 Var19 Var20 Var21 Var22 Var23 Var24 Var25
______ ____ ____ ____ ____ ____ _____ ____ _____________ _____________ _____________ _____________ _____________ _____________ _____________ _____________ _____________ _____ ______ ______ _____ ______ ______ ______ ______
123.00 0.00 8.94 0.00 0.00 8.97 -0.53 4.86 [1×1 missing] [1×1 missing] [1×1 missing] [1×1 missing] [1×1 missing] [1×1 missing] [1×1 missing] [1×1 missing] [1×1 missing] 7.14 -10.27 -11.83 79.67 314.58 492.79 400.52 400.52
124.00 0.00 8.97 0.00 0.00 9.00 -0.47 4.71 [1×1 missing] [1×1 missing] [1×1 missing] [1×1 missing] [1×1 missing] [1×1 missing] [1×1 missing] [1×1 missing] [1×1 missing] 7.11 -10.41 -11.56 79.93 314.35 493.12 400.61 400.61
125.00 0.00 8.93 0.00 0.00 8.96 -0.45 4.66 [1×1 missing] [1×1 missing] [1×1 missing] [1×1 missing] [1×1 missing] [1×1 missing] [1×1 missing] [1×1 missing] [1×1 missing] 7.06 -10.53 -11.23 80.12 313.99 493.47 400.64 400.64
tSegments =
3×25 table
Var1 Var2 Var3 Var4 Var5 Var6 Var7 Var8 Var9 Var10 Var11 Var12 Var13 Var14 Var15 Var16 Var17 Var18 Var19 Var20 Var21 Var22 Var23 Var24 Var25
______ ____ _____ _____ _____ ______ ______ ______ _____ ______ _____ _______ ______ ______ _____ _____ _____ _____ ______ ______ _____ ______ _____ _____ ______
123.00 0.00 59.21 54.50 77.89 -78.23 302.03 892.00 37.75 100.37 28.41 -122.44 343.55 135.89 89.18 85.27 58.57 41.15 322.18 500.28 37.75 100.37 28.41 79.86 342.85
124.00 0.00 58.96 54.49 78.23 -79.49 301.43 890.79 37.24 99.29 29.23 -114.01 343.27 131.59 88.12 84.58 59.54 41.39 322.36 499.57 37.24 99.29 29.23 89.46 344.06
125.00 0.00 58.88 54.47 78.42 -80.55 300.64 889.76 36.97 98.03 30.11 -105.09 343.29 126.97 87.06 83.63 60.52 40.94 322.21 498.77 36.97 98.03 30.11 99.54 345.12
tTrajectories =
3×25 table
Var1 Var2 Var3 Var4 Var5 Var6 Var7 Var8 Var9 Var10 Var11 Var12 Var13 Var14 Var15 Var16 Var17 Var18 Var19 Var20 Var21 Var22 Var23 Var24 Var25
______ ____ _____ ______ ______ _____ _____ ______ _______ ______ ______ _______ ______ ______ _____ ______ ______ _____ ______ ______ ______ ______ ______ ______ ______
123.00 0.00 50.81 359.27 970.07 42.66 46.04 946.57 -172.71 252.49 978.63 -173.44 164.97 976.32 56.60 392.62 730.53 92.55 379.65 502.24 -20.02 369.62 289.29 -75.38 361.01
124.00 0.00 50.13 358.90 970.59 42.10 45.82 946.35 -173.70 252.02 978.65 -174.31 164.49 976.29 55.85 392.45 730.88 92.50 379.48 502.53 -15.05 369.16 286.28 -67.52 361.31
125.00 0.00 49.65 358.41 971.07 41.52 45.60 946.12 -174.70 251.56 978.75 -175.16 164.03 976.30 55.03 392.30 731.20 92.30 379.21 502.81 -10.04 368.90 283.43 -59.52 361.58
>>
I tried to use the records beginning with 'Frame' as variable names, but that's fraught with issues -- and didn't have time to sort them all out...that's what the code for variable vn in the loop does; one would use
tName.Properties.VariableNames=vn;
% or
,'VariableNames',vn % add to argument list of |cell2table| for variable names
on creation, but
>> tTrajectories.Properties.VariableNames=vn;
Duplicate table variable name: 'X'.
>>
so one will have have some naming logic to build a unique set of names for each group of repeated names for each table. I didn't have the time to create that code at the moment.
The above code for vn does create a default VarN name for missing values that occur in at least the first section, but the duplicates was where I then had to resign from the fray somewhat bloodied but mostly just out of time to spare just now.
  3 个评论
Jak
Jak 2021-6-10
Hi again,
The code you sent was brilliant, and worked wonders on the dataset i sent. However it seems to be struggling with this other dataset and I'm not sure why.
I'm getting the error
Index in position 1 exceeds array bounds (must not exceed 12895).
Error in Raw_Data_Split (line 8)
vn=string(G(isec(i)+3,:));
I figure its to do with the length of the tables they are creating as the structure of the dataset is the same the previous one. Whilst probably not ideal, I usually use the "try" function when the array bounds are exceeded, however in this instance it means only the first table (tJoints) is created.
Attached is the new dataset for reference. It has been reduced though.
dpb
dpb 2021-6-10
The file you attached ran to completion here...created the four expected tables.
Wherever the problem is, it must be in the part you didn't include in the posting.

请先登录,再进行评论。

更多回答(1 个)

Mathieu NOE
Mathieu NOE 2021-2-10
hello
here below my suggestion to split the large csv file into smaller txt files
they may be smarter use of regexp but this is what I can offer quick and dirty
seems you can now easily do a second loop to load these files with readtable and generate your multiples tables from there;
hope it helps, even if it can certainly be further refined
% Tell it what folder you want to put the files in
outdir = cd;
% Read the initial file in all at once
filename = 'Level Reduced.csv';
fid = fopen(filename, 'r');
data = fread(fid, '*char').';
fclose(fid);
% Break it into pieces based upon ',,,,,,,,,,,,,,,,,,,,,,,,' lines
pieces = regexp(data, '\n\s*\n', 'split');
[match,piece,startIndex,endIndex] = regexp(data,',,,,,,,,,,,,,,,,,,,,,,,,','match','split');
% Now loop through and save each one
n = 0;
for k = 1:numel(piece)
if size(piece{k},2) > 100 % so avoid storing empty files or single line (title)
n = n + 1;
filename = fullfile(outdir, ['out' num2str(n), '.txt']);
% Now write the piece to the file
fid = fopen(filename, 'w');
fwrite(fid, piece{k});
fclose(fid);
end
end

类别

Help CenterFile Exchange 中查找有关 Tables 的更多信息

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by