Labelling columns of large array in a searchable way?

3 次查看(过去 30 天)
I'm working for some advice on working with large datasets.
I am trying to label individual strips of continous data in a way that the label can then be used to group/sort by a specific tag: 1, 2 or 3.
Currently for each dataset I am reshaping into an array (800x43200), generating variable names containing tag (1x43200 cell), making a table and saving as a txt.
Then I need if I need all the 1 tags from all datasets, I have to read each table, use a for-loop and str2num on the variable name, parse out the tag and use that to gather correct columns.
This doesn't seem like the best way of doing it, I thought perhaps I should be using tabularText datastores or tall tables but these don't seem to help with my sorting/averaging of specific tags.
Any advice you can offer to point me in the right direction will be greatly appreciated.
  3 个评论
Cris LaPierre
Cris LaPierre 2021-3-20
Perhaps I don't quite get your naming scheme, but with a table, if you know the variable name you want to load, you shouldn't have to use a for loop and str2num to get it. See how to access data in tables. Syntax depends on if you want a table returned or an array, but there are various ways you can use the variable name as is to return either.
load patients.mat
T = table(Age,Smoker,Height,Weight,Systolic,Diastolic);
T.Height
ans = 100×1
71 69 64 67 64 68 64 68 68 66
T(:,["Height","Weight"])
ans = 100×2 table
Height Weight ______ ______ 71 176 69 163 64 131 67 133 64 119 68 142 64 142 68 180 68 183 66 132 68 128 66 137 71 174 72 202 65 129 71 181
Jack Bray
Jack Bray 2021-3-20
编辑:Jack Bray 2021-3-20
Thanks for the quick replies, sorry I wasn't clearer, I'll try to explain what I mean in matlab:
% each dataset starts as one long column, I have over 1000 datasets
% currently it looks something like this:
for kk = 1:numel(datasets)
rawdata = load(datasets{kk}); % rawdata = 1x34560000 double
epcdata = reshape(rawdata,800,43200);
for ii = 1:43200
tag(ii) = %use data to get tag: 1,2 or 3
end
% tag = 1x43200 double
for ii = 1:43200
vnames{ii} = [num2str(ii) '_' num2str(tag(ii))];
end
T = array2table(epcdata,'VariableNames',vnames);
writetable(T)
end
% This is all so I can search each dataset for specific tags like this for tag = 1:
for kk = 1:numel(Tables)
T = readtable(Tables{kk})
for ii = 1:43200
tag(1,ii) = str2double(T.Properties.VariableNames{ii}(end));
end
ones(:,kk) = mean(T(:,find(tag == 1)),2);
end
This method of using the variable name of the table as a label to search for seems silly but I can't figure out how something like this should be done.

请先登录,再进行评论。

采纳的回答

Seth Furman
Seth Furman 2021-3-22
table supports custom metadata properties.
In your case, you could add a "tag" custom variable property to T as in the following example.
rng default
rawdata = randi(100,1,34560000); % rawdata = 1x34560000 double
epcdata = reshape(rawdata,800,43200);
vnames = string(1:43200);
T = array2table(epcdata,'VariableNames',vnames);
tag = randi(3,1,43200);
T = addprop(T,"tag","variable");
T.Properties.CustomProperties.tag = tag;
Now the "tag" and variable name properties are distinct
>> T(1:5,1:5)
ans =
5×5 table
1 2 3 4 5
__ __ __ __ __
82 69 68 82 25
91 14 44 19 39
13 73 70 13 44
92 12 26 83 84
64 12 1 64 83
>> T.Properties.CustomProperties.tag(1:5)
ans =
3 3 1 1 1
Note that you will have to write your table to a MAT file instead of a text file in order to preserve the custom property you added.
save T.mat T
Please let me know if this meets your use case.
  1 个评论
Jack Bray
Jack Bray 2021-3-22
Thank you for your answer, I think I can use this approach to make my code much more efficient.
You saved me a lot of hassle as I was just about to attempt to convert it all into HDF5 and use the attributes as a custom tag but using tables will be much more conveinent.
Thanks again!

请先登录,再进行评论。

更多回答(1 个)

Jan
Jan 2021-3-20
vnames{ii} = [num2str(ii) '_' num2str(tag(ii))];
This hides the tags in the names of the variables. This complicated method requires even more complicated methods to access the tags later.
Store the tags as numbers, e.g. as additional column.
  1 个评论
Jack Bray
Jack Bray 2021-3-20
Thanks for this, I had thought about adding an extra row for tags and just using the numbers but it still requires reading each table in order to search for specific tags. Maybe this is just the easiest way to do it.
I just imagined there might be an easier way of organising/searching this kind of columnar data, perhaps using datastores or tall tables etc

请先登录,再进行评论。

类别

Help CenterFile Exchange 中查找有关 Cell Arrays 的更多信息

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by