How to break data in to groups using while loop?

Question

Julia 2023-1-31

0
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/1904025-how-to-break-data-in-to-groups-using-while-loop

评论： dpb 2023-2-3

matrix = [1 50 60 70 50 40
NaN 10 20 10 10
NaN 20 NaN NaN NaN
NaN 60 30 40 50
10 20 10 20 NaN
30 20 40 NaN 50
NaN 50 50 NaN NaN]
matrix = 7×6
  50    60    70    50    40
 NaN    10    20    10    10
 NaN    20   NaN   NaN   NaN
 NaN    60    30    40    50
  10    20    10    20   NaN
  30    20    40   NaN    50
 NaN    50    50   NaN   NaN

The first column indicates to which group the rows belong to. "1, 2, 3" (row 1, 2, 3) are Group 1, "1, 2" (row 4, 5 ) are Group 2, the next "1, 2" ( row 6, 7) are Group 3.

I am trying to find the number of NaN in each group. The desired result would be:

ans =

5

2

4

Is it possible to do this using a while loop?

eg. In Column 1, while "the current element" is larger than "the element in the previous row", statement

I am sorry if this description is confusing.

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

请先登录，再回答此问题。

Answer 1

dpb 2023-1-31

0
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/1904025-how-to-break-data-in-to-groups-using-while-loop#answer_1160755

编辑：dpb 2023-1-31

在 MATLAB Online 中打开

Answer the Q? actually asked -- I'll post this as a separate answer for convenience but not remove the first.

OK, as @Stephen23 pointed out earlier, I didn't read the Q? carefully enough and just assumed the first column values were the group IDs. As the comment below says, I'd first build the grouping variable from the definition of the group; it'll bound to be useful later, anyway. (And, it illustrates another technique worth knowing...)

M= [1 50 60 70 50 40
    2 NaN 10 20 10 10
    3 NaN 20 NaN NaN NaN
    1 NaN 60 30 40 50
    2 10 20 10 20 NaN
    1 30 20 40 NaN 50
    2 NaN 50 50 NaN NaN];
g=nan(size(M,1),1);             % first build a grouping variable of right size; missing values
g(M(:,1)==1)=unique(M(:,1));    % populate the first of each group location with its ID/group
g=fillmissing(g,'previous');    % and fill in the rest
nNaN=sum(groupsummary(M,g,@(x)sum(isnan(x))),2)
nNaN = 3×1
     5
     2
     4

NOTA BENE: the whole M array can be passed here; the first column is immaterial to the count...

@the cyclist's use of splitapply will also work here as well

nNaN=splitapply(@(x)sum(isnan(x),'all'),M(:,2:end),g)
nNaN = 3×1
     5
     2
     4

Oh. NOTA BENE SECOND:

The above needs modification to build the grouping variable in general -- it's probably just coincidence there are three groups and a maximum of three rows in any one group. So, unique isn't the generic answer for the RHS of the assignment of the initial group indices to the grouping variable -- it would be more like

g=nan(size(M,1),1);   % first build a grouping variable of right size; missing values
ix=find(M(:,1)==1);   % the locations of each group start
g(ix)=1:numel(ix);    % populate the first of each group location with its ID/group

This will count the number of times the first index value occurs and where and generate that many groups irrespective of the number of records/group.

8 个评论
显示 6更早的评论隐藏 6更早的评论

dpb 2023-2-1

编辑：dpb 2023-2-1

在 MATLAB Online 中打开

Oh. That's a different Q? and makes some sense in terms of why one might possibly want to know... :)

Here's prime example of "don't try to do too much" in a single line of code -- make a precalculation first and then operate over that helper array to do the grouping. One could probably manage to wrap it all into one expression in the end, but trying to do it on the fly initially is going to be hard, indeed...

M= [1  50 60  70  50  40
    2 NaN 10  20  10  10
    3 NaN 20 NaN NaN NaN
    1 NaN 60  30  40  50
    2  10 20  10  20 NaN
    1  30 20  40 NaN  50
    2 NaN 50  50 NaN NaN];
g=nan(size(M,1),1);
ix=(M(:,1)==1);
g(ix)=[1:sum(ix)];
g=fillmissing(g,'previous');
MM=isnan(circshift(M,1,2))&isfinite(M)
MM = 7×6 logical array
   0   0   0   0   0   0
   0   0   1   0   0   0
   1   0   1   0   0   0
   0   0   1   0   0   0
   1   0   0   0   0   0
   0   0   0   0   0   1
   1   0   1   0   0   0
splitapply(@sum,MM(:,2:end),g)
ans = 3×5
     0     2     0     0     0
     0     1     0     0     0
     0     1     0     0     1

I truncated the first column above; if you do want/need the six-column array in the end, then clean out the first column in MM first...

MM(:,1)=0;
splitapply(@sum,MM,g)
ans = 3×6
     0     0     2     0     0     0
     0     0     1     0     0     0
     0     0     1     0     0     1

Julia 2023-2-3

在 MATLAB Online 中打开

Hi @dpb, may I ask your help for a follow up question?

If now, I want to find the number of elements that are NaN in one column but are non-NaN in its previous column, how should I modify the code?

I tried MM=isfinite(circshift(M,1,2))&isnan(M)

however, this code compares Column 2 (the 1st meaningful column) against Column 1 (the group indicator column), which does not give me my desired answer.

M= [1  50 60  70  50  40
NaN 10  20  10  10
NaN 20 NaN NaN NaN
NaN 60  30  40  50
10 20  10  20 NaN
30 20  40 NaN  50
NaN 50  50 NaN NaN];
MM=isfinite(circshift(M,1,2))&isnan(M)
MM = 7×6 logical array
 0   0   0   0   0
 1   0   0   0   0
 1   0   1   0   0
 1   0   0   0   0
 0   0   0   0   1
 0   0   0   1   0
 1   0   0   1   0

Instead of the result given by this line of code (above), would you please teach me how to obtain this result I want (below)?

0 0 0 0 0 0

0 0 1 0 0 0

0 0 0 0 0 0

0 0 0 0 0 1

0 0 1 0 0 0

dpb 2023-2-3

在 MATLAB Online 中打开

"...elements that are NaN in one column but are non-NaN in its previous column"

That's precisely the result of MM above; that the first column is in there is immaterial; it simply is a placeholder to match the size of M.

That's why the solution above uses either MM(:,2:end) or cleans out MM(:,1) when done; it is known that the first column isn't of interest. But, that doesn't affect the remaining columns.

Oh. Brain freeze -- it's the first two columns of the result that are invalid, not just the first. So, use

M= [1  50 60  70  50  40
NaN 10  20  10  10
NaN 20 NaN NaN NaN
NaN 60  30  40  50
10 20  10  20 NaN
30 20  40 NaN  50
NaN 50  50 NaN NaN];
MM=isfinite(circshift(M,1,2))&isnan(M);
MM(:,1:2)=false
MM = 7×6 logical array
 0   0   0   0   0
 0   0   0   0   0
 0   0   1   0   0
 0   0   0   0   0
 0   0   0   0   1
 0   0   0   1   0
 0   0   0   1   0

请先登录，再进行评论。

Answer 2

dpb 2023-1-31

2
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/1904025-how-to-break-data-in-to-groups-using-while-loop#answer_1160615

编辑：dpb 2023-1-31

在 MATLAB Online 中打开

No explicit looping construct needed; let MATLAB do it for you...I shortened your variable name to M...

M= [1 50 60 70 50 40
    2 NaN 10 20 10 10
    3 NaN 20 NaN NaN NaN
    1 NaN 60 30 40 50
    2 10 20 10 20 NaN
    1 30 20 40 NaN 50
    2 NaN 50 50 NaN NaN];
nNaN=sum(groupsummary(M(:,2:end),M(:,1),@(x)sum(isnan(x))),2)
nNaN = 3×1
     2
     5
     4

See <groupsummary> for the details on using it; in short, the first argument is the array to compute the summary over (in your case everything except the first column) while the second is the grouping variable (your first column). Then the function to apply can be a builtin such as 'mean' or as shown above, whatever you want it to be.

groupsummary applies the function to each column of the array by grouping variable so above we first add up how many True values are returned for each column by isnan, that returns an array of counts by column; The total for each group then is simply the sum of those by row (the second, optional argument, 2) applied to the resulting array.

ADDENDUM: (Answer the Q? actually asked)*

OK, as @Stephen23 points out below, I didn't read the Q? carefully enough and just assumed the first column values were the group IDs. As the comment below says, I'd first build the grouping variable from the definition of the group; it'll bound to be useful later, anyway. (And, it illustrates another technique worth knowing...)

g=nan(size(M,1),1);             % first build a grouping variable of right size; missing values
g(M(:,1)==1)=unique(M(:,1));    % populate the first of each group location with its ID/group
g=fillmissing(g,'previous');    % and fill in the rest
nNaN=sum(groupsummary(M(:,2:end),g,@(x)sum(isnan(x))),2)
nNaN = 3×1
     5
     2
     4

Same solution still works, just use the new grouping variable in place of the column values...

The above can/will work generically for other numbering sequences but one would need to save the result of unique and explicitly code for whatever was the indicator value for starting the new sequence other than 1.

*PS. I wondered why so many people were getting the wrong answer first time... :)

2 个评论
显示无隐藏无

Stephen23 2023-1-31

@dpb: I really like this solution, but note that the groups given in the matrix are not duplcaites of the same number, but are given in increasing runs of integers. A bit trickier.

dpb 2023-1-31

Oh. In skimming over the Q? text, I missed that. I'd probably make a new grouping variable, then, first...it'll probably be wanted/needed for subsequent analyses on the dataset anyway...

请先登录，再进行评论。

Answer 3

the cyclist 2023-1-31

1
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/1904025-how-to-break-data-in-to-groups-using-while-loop#answer_1160600

在 MATLAB Online 中打开

Here is one way:

matrix = [1 50 60 70 50 40
    2 NaN 10 20 10 10
    3 NaN 20 NaN NaN NaN
    1 NaN 60 30 40 50
    2 10 20 10 20 NaN
    1 30 20 40 NaN 50
    2 NaN 50 50 NaN NaN];
g = findgroups(matrix(:,1));
out = splitapply(@(x)sum(isnan(x(:))),matrix(:,2:end),g)
out = 3×1
     2
     5
     4

2 个评论
显示无隐藏无

dpb 2023-1-31

@the cyclist, this one suffers the same fate of my first solution below w/ groupsummary. findgroups considers each numeric value a group; one must build a new grouping variable here based on the occurrence of the first element in the subsequent sequence...

the cyclist 2023-1-31

Ah ... I misread OP's question (apparently in the exact same way you did). Thanks for pointing that out.

请先登录，再进行评论。

Answer 4

Benjamin Thompson 2023-1-31

0
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/1904025-how-to-break-data-in-to-groups-using-while-loop#answer_1160575

编辑：Jan 2023-2-1

在 MATLAB Online 中打开

Even better, use the vectorization features in Matlab to count NaNs in each row all at once.

nanCounts = sum(isnan(matrix),1); % This will count NaNs in each row.
groupCounts = zeros(size(unique(matrix),1),); % Creates a zero vector whose length is equal to number of groups
for i = 1:length(groupCounts)
    I = nanCounts(matrix(:,1) == i; % Creates an index vector for group i counts
    groupCounts(i) = sum(I);
end

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

Answer 5

Voss 2023-1-31

0
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/1904025-how-to-break-data-in-to-groups-using-while-loop#answer_1160590

在 MATLAB Online 中打开

matrix = [1 50 60 70 50 40
    2 NaN 10 20 10 10
    3 NaN 20 NaN NaN NaN
    1 NaN 60 30 40 50
    2 10 20 10 20 NaN
    1 30 20 40 NaN 50
    2 NaN 50 50 NaN NaN];
group_start_idx = find(matrix(:,1) == 1);
group_end_idx = find(diff([matrix(:,1); 1]) <= 0);
n_groups = numel(group_start_idx);
n_nans = zeros(n_groups,1);
for ii = 1:n_groups
    n_nans(ii) = nnz(isnan(matrix(group_start_idx(ii):group_end_idx(ii),:)));
end
disp(n_nans);
     5
     2
     4

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

Answer 6

Voss 2023-1-31

0
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/1904025-how-to-break-data-in-to-groups-using-while-loop#answer_1160620

在 MATLAB Online 中打开

"Is it possible to do this using a while loop?"

matrix = [1 50 60 70 50 40
    2 NaN 10 20 10 10
    3 NaN 20 NaN NaN NaN
    1 NaN 60 30 40 50
    2 10 20 10 20 NaN
    1 30 20 40 NaN 50
    2 NaN 50 50 NaN NaN];
n_nans = [];
row = 1;
n_rows = size(matrix,1);
while row <= n_rows
    group_start_row = row;
    while row < n_rows && matrix(row+1,1) > matrix(row,1)
        row = row+1;
    end
    group_end_row = row;
    n_nans(end+1,1) = nnz(isnan(matrix(group_start_row:group_end_row,:)));
    row = row+1;
end
disp(n_nans);
     5
     2
     4

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

Answer 7

Jan 2023-2-1

0
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/1904025-how-to-break-data-in-to-groups-using-while-loop#answer_1161555

在 MATLAB Online 中打开

matrix = [1 50 60 70 50 40; ...
          2 NaN 10 20 10 10; ...
          3 NaN 20 NaN NaN NaN; ...
          1 NaN 60 30 40 50; ...
          2 10 20 10 20 NaN; ...
          1 30 20 40 NaN 50; ...
          2 NaN 50 50 NaN NaN];
group = cumsum([1; diff(matrix(:, 1)) < 0]);  %  [1 1 1 2 2 3 3].'
data  = sum(isnan(matrix), 2);
result = accumarray(group, data)
result = 3×1
     5
     2
     4

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

dpb 2023-2-1

@Jan, cumsum is very clever for grouping variable computation. +1

I first thought of using the <0 for the breakpoint, but didn't have the flash on cumsum so got bogged down...

请先登录，再进行评论。

How to break data in to groups using while loop?

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

采纳的回答

8 个评论
显示 6更早的评论隐藏 6更早的评论

更多回答（6 个）

2 个评论
显示无隐藏无

2 个评论
显示无隐藏无

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

另请参阅

类别

标签

产品

版本

Community Treasure Hunt

How to break data in to groups using while loop?

0 个评论 显示 -2更早的评论隐藏 -2更早的评论

采纳的回答

8 个评论 显示 6更早的评论隐藏 6更早的评论

更多回答（6 个）

2 个评论 显示 无隐藏 无

2 个评论 显示 无隐藏 无

0 个评论 显示 -2更早的评论隐藏 -2更早的评论

0 个评论 显示 -2更早的评论隐藏 -2更早的评论

0 个评论 显示 -2更早的评论隐藏 -2更早的评论

1 个评论 显示 -1更早的评论隐藏 -1更早的评论

另请参阅

类别

标签

产品

版本

Community Treasure Hunt

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

8 个评论
显示 6更早的评论隐藏 6更早的评论

2 个评论
显示无隐藏无

2 个评论
显示无隐藏无

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

1 个评论
显示 -1更早的评论隐藏 -1更早的评论