Indexing geochemical data arrays with different numbers of elements

1 次查看(过去 30 天)
I have a table of data collected from an instrument that makes 6 measurements for each sample. At the end of the analysis I'm left with a CSV file containing 6 rows of data for every sample. For example, if I analyze 100 samples, I have a CSV file with 600 rows. I have written a code to process the data, and I use only the last three measurements (rows 4-6 of the array 'injection') for each sample. Here's how I read the data and create the arrays
T = readtable("data", 'VariableNamingRule','preserve');
%define the variables
line = table2array(T(:,1));
d18O_raw = table2array(T(:,4));
port = table2array(T(:,2));
injection = table2array(T(:,3));
%select the last three measurements
%use only injections 4-6 for each sample
line = line(injection>3);
d18O_raw = d18O_raw(injection>3);
port = port(injection>3);
injection = injection(injection>3);
I average the three measurements for each sample so I am left with one measurement per variable for sample. Importantly, I also reshape the "port" variable which helps me to identify each sample (so I match up each variable with the corresponding sample later).
d18O_raw = reshape(d18O_raw, [3, numel(line)/3]);
Error using reshape
Size arguments must be real integers.
average_d18O = transpose(mean(d18O_raw));
port_reshaped = port(1:3:end,:);
Here's where my issue arises. Sometimes, the machine has an error and only measures a sample 5 times instead of 6 times. In the sample data included, the first sample has only been measured 5 times, but it could in theory happen at any point in the analysis. Currently I have to manually fix a file (or change my code) if there is a sample that has only been measured 5 times. I want to be able to have my code handle a sample that has EITHER 5 or 6 measurements, automatically select the last 2 or 3 measurements (i.e., always skip the first 3 measurements) and then be able to average either 2 or 3 measurements and index the corresponding ports if there are 2 or 3 samples.
My current way of handling this issue is clunky and doesn't make the script easy to share with others, which is the goal.
Thank you in advance for your help.
  2 个评论
Siddharth Bhutiya
Siddharth Bhutiya 2023-4-27
Star Strider has already answered the question below. But I'll just mention this. For the lines of code that are doing the following:
line = table2array(T(:,1));
Simpler way to just extract the entire variable is to use dot indexing as follows:
line = T.Line;
% OR
line = T.(1);

请先登录,再进行评论。

采纳的回答

Star Strider
Star Strider 2023-4-26
编辑:Star Strider 2023-4-26
This can be done in a relatively straightforward way by first separating the sub-matrices into dindividual cells using the accumarray function, and then using the cellfun function to calculate the mean of elements (4:end) of column 4 where ‘end’ (the length of the column) can be any length.
T = readtable("data", 'VariableNamingRule','preserve')
T = 53×4 table
Line Port Inj Nr d(18_16)Mean ____ ____ ______ ____________ 1 1 1 -39 2 1 2 -39.973 3 1 3 -39.527 4 1 4 -40.579 5 1 5 -40.9 6 2 1 -33.315 7 2 2 -33.008 8 2 3 -32.989 9 2 4 -33.028 10 2 5 -33.03 11 2 6 -33.021 12 3 1 NaN 13 3 2 -10.256 14 3 3 -9.766 15 3 4 -9.658 16 3 5 -9.644
%define the variables
line = table2array(T(:,1));
d18O_raw = table2array(T(:,4));
port = table2array(T(:,2));
injection = table2array(T(:,3));
% %select the last three measurements
% %use only injections 4-6 for each sample
% line = line(injection>3);
% d18O_raw = d18O_raw(injection>3);
% port = port(injection>3);
% injection = injection(injection>3);
[G,ID] = findgroups(T.Port); % Use 'Port' To Define The Groups
A = accumarray(G, T{:,1}, [], @(x){T{x,:}}) % Accumulate Sub-Matrices According To 'G'
A = 9×1 cell array
{5×4 double} {6×4 double} {6×4 double} {6×4 double} {6×4 double} {6×4 double} {6×4 double} {6×4 double} {6×4 double}
A{1} % Display Intermediate Results (Optional)
ans = 5×4
1.0000 1.0000 1.0000 -39.0000 2.0000 1.0000 2.0000 -39.9730 3.0000 1.0000 3.0000 -39.5270 4.0000 1.0000 4.0000 -40.5790 5.0000 1.0000 5.0000 -40.9000
A{end} % Display Intermediate Results (Optional)
ans = 6×4
48.0000 9.0000 1.0000 -6.6190 49.0000 9.0000 2.0000 -6.5460 50.0000 9.0000 3.0000 -6.5450 51.0000 9.0000 4.0000 -6.6010 52.0000 9.0000 5.0000 -6.6250 53.0000 9.0000 6.0000 -6.6470
Outc = cellfun(@(x)mean(x(4:end,4)), A, 'Unif',0) % Calculate The 'mean' Of Rows 4:end In Each Sub-Matrix
Outc = 9×1 cell array
{[-40.7395]} {[-33.0263]} {[ -9.6240]} {[ 0.3067]} {[ -9.4157]} {[ -7.2603]} {[ -5.6343]} {[ -6.4563]} {[ -6.6243]}
Outn = cell2mat(Outc) % Convert The 'cell' Array To A Numeric Array
Outn = 9×1
-40.7395 -33.0263 -9.6240 0.3067 -9.4157 -7.2603 -5.6343 -6.4563 -6.6243
% The 'Check' Variable Can Be Deleted, Since It Simply Shows How The Code Works, And Checks The Results
Check = [mean(A{1}(4:end,4)) mean(A{2}(4:end,4)) mean(A{3}(4:end,4)) mean(A{4}(4:end,4)) mean(A{5}(4:end,4)) mean(A{6}(4:end,4)) mean(A{7}(4:end,4)) mean(A{8}(4:end,4)) mean(A{9}(4:end,4))].'
Check = 9×1
-40.7395 -33.0263 -9.6240 0.3067 -9.4157 -7.2603 -5.6343 -6.4563 -6.6243
EDIT — (265 Apr 2023 at 21:48)
Changed the second accumarray argument to choose the correct data. (Not catching that earlier.)
.
  4 个评论

请先登录,再进行评论。

更多回答(0 个)

类别

Help CenterFile Exchange 中查找有关 Logical 的更多信息

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by