Removing duplicate rows (not "unique")

Question

0 个投票

I have a matrix with many (1e5+) rows and I want to remove both copies of all duplicate rows. Is there a fast way to do this? (This function needs to be run many times.)

4 个评论
显示 2更早的评论隐藏 2更早的评论

jgg 2016-5-4

在 MATLAB Online 中打开

You can use the other calling methods to get replicate counts.

 a = [1 2; 1 2; 2 3; 2 4; 2 5; 4 2; 4 2; 1 3; 1 3; 4 5];
 [C,ia,ic] = unique(a,'rows');
 [count key] = hist(ic,unique(ic));

Then you can just select the keys with non-unit counts and drop them.

Michael Siebold 2016-5-4

Perfect and thanks a million! I kept messing with ia and ic, but just wasn't thinking histogram... Would you mind submitting this as an answer so I can accept it?

请先登录，再进行评论。

请先登录，再回答此问题。

请先登录再关注

Answer 1

Roger Stafford 2016-5-5

编辑：Roger Stafford 2016-5-5

在 MATLAB Online 中打开

1 个投票

Let A be your matrix.

   [B,ix] = sortrows(A);
   f = find(diff([false;all(diff(B,1,1)==0,2);false])~=0);
   s = ones(length(f)/2,1);
   f1 = f(1:2:end-1); f2 = f(2:2:end);
   t = cumsum(accumarray([f1;f2+1],[s;-s],[size(B,1)+1,1]));
   A(ix(t(1:end-1)>0),:) = []; % <-- Corrected

6 个评论
显示 4更早的评论隐藏 4更早的评论

Michael Siebold 2016-5-5

编辑：Michael Siebold 2016-5-5

And this solution is even faster than the first suggestion in the comments! Thanks for all the help!

saad sulaiman 2022-11-5

greetings.

how could we apply this code to a mesh where we have coordinate points for each triangle, such that we remove the internal edges, or edges shared by two triangles?

thanks in advance.

请先登录，再进行评论。

Answer 2

Azzi Abdelmalek 2016-5-4

编辑：Azzi Abdelmalek 2016-5-4

在 MATLAB Online 中打开

1 个投票

A=randi(5,10^5,3);
tic
A=unique(A,'rows');
toc

The result

Elapsed time is 0.171778 seconds.

3 个评论
显示 1更早的评论隐藏 1更早的评论

Azzi Abdelmalek 2016-5-4

编辑：Azzi Abdelmalek 2016-5-4

You said that unique function will leave a copy of duplicate rows. With this example, I show you that there is no duplicates rows stored! And also it doesn't take much time

Mitsu 2021-8-3

在 MATLAB Online 中打开

I reckon your answer does not address OP's question because running the following:

A=[1 1 1;1 1 1;1 1 0];
tic
A=unique(A,'rows');
toc

Will yield:

A =  1     1     0
     1     1     1

Therefore, A still contains one instance of each row that was duplicate. I believe Michael wanted all instances of each row that appears multiple times be removed.

请先登录，再进行评论。

Answer 3

GeeTwo 2022-8-16

0 个投票

%Here's a much cleaner way to do it with 2019a or later!

[B,BG]=groupcounts(A);

A_reduced=BG(B==1); % or just A if you want the results in the same variable.

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

Removing duplicate rows (not "unique")

4 个评论
显示 2更早的评论隐藏 2更早的评论

采纳的回答

6 个评论
显示 4更早的评论隐藏 4更早的评论

更多回答（2 个）

3 个评论
显示 1更早的评论隐藏 1更早的评论

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

类别

标签

Community Treasure Hunt

Removing duplicate rows (not "unique")

4 个评论 显示 2更早的评论 隐藏 2更早的评论

采纳的回答

6 个评论 显示 4更早的评论 隐藏 4更早的评论

更多回答（2 个）

3 个评论 显示 1更早的评论 隐藏 1更早的评论

0 个评论 显示 -2更早的评论 隐藏 -2更早的评论

类别

标签

另请参阅

Community Treasure Hunt

4 个评论
显示 2更早的评论隐藏 2更早的评论

6 个评论
显示 4更早的评论隐藏 4更早的评论

3 个评论
显示 1更早的评论隐藏 1更早的评论

0 个评论
显示 -2更早的评论隐藏 -2更早的评论