Is it possible to code my 'data loading' in a less sloppy / more automated way?

4 次查看(过去 30 天)
Hi Everyone, I am currently loading and saving excel files, first into my workspace and then into .mat files for later use. The amount of data I have to save is really big, and I will very likely have to do this more often as it need to be done for every new set of measurements we do. This is why I am wondering if there is any shorter or more automated way to write it down. I have to repeat this for 13 patches for every inclusion. In the end I already have a loop to save it all. Thanks in advance to anyone that can help! If it is possible...
This also is a patch with onyl two measurements, sometimes the are 9 measurements leading to a total of 36 files to load.
At the moment I do something like this:
%% patch 6
f_6 = ["S6_M1_1039_O2norm.xlsx", "S6_M2_1147_O2norm.xlsx",...
"S6_M1_1039_O20.xlsx", "S6_M2_1147_O20.xlsx"] ;
%O2norm
S1_Patch6M1_O2norm_630nm = zeros(4000,5);
S1_Patch6M2_O2norm_630nm = zeros(4000,5);
S1_Patch6M1_O2norm_670nm = zeros(4000,5);
S1_Patch6M2_O2norm_670nm = zeros(4000,5);
for i = 2:1:6
S1_Patch6M1_O2norm_630nm(:,i-1) = xlsread(f_6(1), i, 'A2:A4001');
S1_Patch6M2_O2norm_630nm(:,i-1) = xlsread(f_6(2), i, 'A2:A4001');
S1_Patch6M1_O2norm_670nm(:,i-1) = xlsread(f_6(1), i, 'B2:B4001');
S1_Patch6M2_O2norm_670nm(:,i-1) = xlsread(f_6(2), i, 'B2:B4001');
end
%O20
S1_Patch6M1_O2norm_630nm = zeros(4000,5);
S1_Patch6M2_O2norm_630nm = zeros(4000,5);
S1_Patch6M1_O2norm_670nm = zeros(4000,5);
S1_Patch6M2_O2norm_670nm = zeros(4000,5);
for i = 2:1:6
S1_Patch6M1_O2norm_630nm(:,i-1) = xlsread(f_6(3), i, 'A2:A4001');
S1_Patch6M2_O2norm_630nm(:,i-1) = xlsread(f_6(4), i, 'A2:A4001');
S1_Patch6M1_O2norm_670nm(:,i-1) = xlsread(f_6(3), i, 'B2:B4001');
S1_Patch6M2_O2norm_670nm(:,i-1) = xlsread(f_6(4), i, 'B2:B4001');
end
  4 个评论
dpb
dpb 2022-5-12
编辑:dpb 2022-5-12
Carrying on from the above...
%% patch 6
f_6 = ["S6_M1_1039_O2norm.xlsx", "S6_M2_1147_O2norm.xlsx",...
"S6_M1_1039_O20.xlsx", "S6_M2_1147_O20.xlsx"] ;
Start by making sure your file-naming convention is searchable by wildcard matching -- it appears it may be as is if the above "S6_" prefix is ID for set 6. If that is the case, I'd strongly suggest also using something like "S00006_" by a "%06d" width field and the preceeding zero fill to handle a large number of measurement sets. Doing this will also mean the files will sort contiguously by number which can be a big help. You might also want to consider some sort of directory structure for storing the files so that you don't end up with 10,000 files in one folder. By week or month could be one way; but any organizing method would work as long as it is logical and sorts (which means if use dates, use the YYMMDD numeric order also with leading zeros, not MMM/DD/YY, for example).
With that, you can begin to turn the above code into a generic function by using
Set=6; % set/ask user for measurement set of interest
ROOT_DIR="C:\BaseDataPath"; % if fixed or use uigetdir or ...
DATASET="S"+Set; % take advantage MATLAB string syntax to catenate number to string
WILDSTR="_*.xlsx"; % the wildcard pattern to match file naming convention
d=dir(fullfile(root_data_path,strcat(DATASET,WILDSTR)); % and return all those matching files
for i=1:numel(d)
tData=readtable(fullfile(d(i).folder,d(i).name)); % and read the file
...do whatever with each file here...
end
Would need more details on just what is content and need going forward, but either add the metadata as data in the table or use a struct or some other similar organizing storage instead of creating files with such data in the names.
Yasmin Ben Azouz
Yasmin Ben Azouz 2022-5-18
@Stephen23 @Jan @dpb Thank you for the replies!! I am slowsly learning the disadvantages of data in variables and am understanding the advantages of saving data into structures. So again thank you and Stephen thanks for the tutorial on why not to name variables dynamically (it is of greatgreatgreat help)
I now create an 1x(amount of measurements)x5 ND array for each patch, and I then put the patches for a subject in a Struct with fieldnames of the patches. The data itself is saved in 1x5 cells in the 5th row of the 3rd dimension. These cells each contain tables with a measurement for a different wavelength.
I know this is still far from perfect as i am still hassling with what structures to use. Additionally the data i am working with is extremely messy, which excludes many easier and more organised options.
Just thought I would get back to you guys! Thanks again. Below is my code.
%% Get directory
ROOT_DIR = uigetdir ; % if fixed or use uigetdir or ...
%% Definitions
patches = ["S1","S2","S3","S4","S5","S6","S7","S8","S9","S10","S11","S12","BGS"] ;
data = cell(1,5) ;
range = 'A1:B4001';
final = struct('S1',[],'S2','S3',[],'S4',[],'S5',[],...
'S6',[],'S7',[],'S8',[],'S9',[],'S10',[],'S11',[],'S12',[],'BGS',[]) ;
%% Loop to fill patches struct with ND-arrays.
for p=1:numel(patches)
DATASET= patches(p) ;
WILDSTR="_*.xlsx"; % the wildcard pattern to match file naming convention
d=dir(fullfile(ROOT_DIR,strcat(DATASET,WILDSTR))); % and return all those matching files
newname = split(erase({d.name},'.xlsx'),'_') ;
for i=1:numel(d)
for k=2:1:6
data{k-1}=readtable(fullfile(d(i).folder,d(i).name),'sheet',k,...
'VariableNamingRule','preserve','Range', range);
data{k-1}.Properties.VariableNames{'Dev1/ai0'} = 'nm630';
data{k-1}.Properties.VariableNames{'Dev1/ai1'} = 'nm670';
end
newname(:,i,5) = {data} ;
end
final.(patches(p))= newname ;
end

请先登录,再进行评论。

回答(0 个)

产品


版本

R2020b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by