Delete rows with bad data and surrounding rows
显示 更早的评论
I would like to delete rows which contain ones, sinces ones indicate bad data (inclusion criterion 1). Moreover, I would like to remove rows that are surrounded by those rows with bad information. The aim is to only include rows if they are present in sets of minimally 3 good (all zeros) rows (inclusion criterion 2). I created a matrix B to explain my question:
B = [0 1 0 0 1 0 1;
0 0 0 0 0 0 0;
0 1 0 0 1 0 1;
0 1 0 0 0 1 0;
0 0 0 0 0 0 0;
0 1 0 1 1 0 1;
0 0 0 0 0 0 0;
0 0 0 0 0 0 0;
0 1 0 0 0 1 0;
0 1 0 0 0 1 0;
0 0 0 0 0 0 0;
0 0 0 0 0 0 0;
0 0 0 0 0 0 0;
0 1 0 0 1 1 0;
0 0 0 0 0 0 0;
0 0 0 0 0 0 0;
0 0 0 0 0 0 0;
1 0 0 0 1 0 0;
1 0 1 1 1 0 1];
In this 19x7 matrix row 1, 3, 4, 6, 8, 9, 10 ,1 4, 18 an 19 would be deleted by inclusion criterion 1. So far my loop (for multiple matrices like B) works. Regarding my inclusion criterion 2, row 2, 5, 7, and 8 must be deleted as well since they are not part of set of 3 or more rows with zeros. For inclusion criterion 2 I have to create an if structure in my existing loop.
% find or strcmp to look for the rows
% todelete = [] to eliminate these r
How can I delete rows that contain ones OR (||) are present in a set of less than 3 rows with all zeros?
2 个评论
madhan ravi
2019-7-26
Would you mind showing how your expected result should look like??
L Maas
2019-7-26
采纳的回答
更多回答(3 个)
First, the easiest and fastest way to implement criterion 1 is:
todelete = any(B, 2);
For criterion 2, since you just want to look on either side, you can just shift up or down the above vector:
todeleteall = todelete | [false; todelete(1:end-1)] | [todelete(2:end); false];
B(todeleteall, :) = []
Another way of implementing 2, particularly if you want a larger windows than one each side is with a convolution:
halfwindow = 1; %up or down
todeleteall = conv(todelete, ones(2*halfwindow+1, 1), 'same') > 0;
B(todeleteall, :) = []
edit: or as shown by Andrei, you could also use imdilate. There are many ways you could implement that criterion 2. movsum would be another one (which would let you have different before and after good rows).
edit2: As per the cyclist comment, the above is not quite right, see later comment for the actual solution.
11 个评论
Andrei Bobrov
2019-7-26
+1
the cyclist
2019-7-26
The todeleteall algorithm isn't quite right. I believe it is insisting on a valid row above or below, but that is not required for the first or last row of a set.
Guillaume
2019-7-26
@the cyclist, I'm not sure what you mean. As far as I can tell, all the options I've proposed will only look at the rows below for the top row(s), and the rows above for the bottom row(s).
the cyclist
2019-7-26
编辑:the cyclist
2019-7-26
For example, your solution deletes rows 11 and 13, which are part of the valid set 11,12,13 (if I understand OP's directive properly).
As part of the check of row 11, it is checking for a valid row 10, but that is not necessary (since 11 is the first row of a set).
Oh indeed, you're right that doesn't quite work.
Ok, the easiest is the use the undocumented fact that strfind works on numeric vectors:
todelete = any(B, 2);
startrun = strfind(todelete', [0, 0, 0]); %need 3 consecutive zeros
tokeep = unique(startrun + [0; 1; 2]);
B = B(tokeep, :)
The downside of this method is that it uses undocumented features so may not work in a future version.
Guillaume
2019-7-29
It's very unclear what criterion 1 / non-good value is in this case. In your question, you said: "good (all zeros) rows". The any(B, 2) will treat non-zero as non-good. If the criterion is now something else, you need to say.
Since you've now shown a proper example in the comment to another answer (but still haven't explained exactly what is a good row or a bad row),
Two options:
- A bad row is any row where there's a 1:
todelete = any(B == 1, 2);
- A bad row is a any row made exclusively of 0s and 1s
todelete = all(ismember(B, [0 1]), 2);
rest of the code is unchanged
L Maas
2019-8-1
Guillaume
2019-8-1
the above can easily be changed to apply to just certain columns. If the criteria is that good rows have 0s in column 1,4,5,6, 9,10 and 11, then
todelete = any(B(:, [1, 4, 5, 6, 9, 10, 11]), 2);
and then, as it got buried in all the comments, the simplest way to apply criterion 2 is:
startrun = strfind(todelete', [0, 0, 0]); %need 3 consecutive zeros
tokeep = unique(startrun + [0; 1; 2]);
B = B(tokeep, :)
L Maas
2019-8-2
Andrei Bobrov
2019-7-26
编辑:Andrei Bobrov
2019-7-26
ii - row indices with valid data (imdilate - function from the Image Processing Toolbox).
ii = find(~imdilate(any(B,2),[1;1;1]));
Other variant
lo = any(B,2) == 0;
ii_valid = unique(strfind(lo(:)',ones(1,3)) + (0:2)');
the cyclist
2019-7-26
编辑:the cyclist
2019-7-26
Here is one way.
Bm2 = [ones(2,N); B(1:end-2,:)];
Bm1 = [ones(1,N); B(1:end-1,:)];
Bp1 = [B(2:end,:); ones(1,N)];
Bp2 = [B(3:end,:); ones(2,N)];
v = not(any(B, 2));
vm2 = not(any(Bm2,2));
vm1 = not(any(Bm1,2));
vp1 = not(any(Bp1,2));
vp2 = not(any(Bp2,2));
valid = (vm2 & vm1 & v) | (vm1 & v & vp1) | (v & vp1 & vp2);
The output variable valid is a logical vector with "true" at each valid row. Use
find(valid)
to get the indices of the valid rows.
类别
在 帮助中心 和 File Exchange 中查找有关 Loops and Conditional Statements 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!