How to perform row operations on a large MATRIX or CELL efficiently?

Question

Ashish Keshkamat 2017-10-6

0
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/360035-how-to-perform-row-operations-on-a-large-matrix-or-cell-efficiently

回答： Ashish Keshkamat 2017-11-14

I have a large CELL of the order 18000x4 (Or a MATRIX of the similar order) Each CELL element is a 'name' and therefore a string (In case of Matrix it is a 'number') The operation I need to do is explained in the following example (Please find relevant files attached). Example:

CELL INPUT
'1' '3' '4' []
'1' '3' '5' []
'1' '3' '6' []
'1' '4' '4' []
'1' '4' '5' []
'1' '4' '6' []
'2' '3' '4' []
'2' '3' '5' []
'2' '3' '6' []
'2' '4' '4' []
'2' '4' '5' []
'2' '4' '6' []
'5' '4' [] []
'5' '7' [] []
'6' '8' '4' []
'6' '8' '7' []
'9' '10' [] []
CELL OUTPUT
'1' '4' [] []
'1' '3' '5' []
'1' '3' '6' []
'2' '4' [] []
'2' '3' '5' []
'2' '3' '6' []
'4' '5' [] []
'5' '7' [] []
'4' '6' '8' []
'6' '7' '8' []
'10' '9' [] []

There are 2 main operations done to get the output:

1) Delete the duplicates in the rows. (1 4 4) ---> (1 4) and (2 4 4) ---> (2 4) in the above example.

2) After step (1), compare every row and delete the supersets of the row if any. (1 3 4) (1 4 5) (1 4 6) deleted because they are supersets of (1 4). Similarly, (2 3 4) (2 4 5) (2 4 6) are deleted because they are supersets of (2 4)

I have written a MATLAB function called 'minimize' which does this operation. But, when I need to do this for a large CELL of order 8656x4, MATLAB keeps running and there is no solution after 48hours. I need to get this done for even larger orders of the CELL and within a time frame of 1-2 hours.

Could you please provide any method/suggestion to efficiently solve this?

2 个评论
显示无隐藏无

Carl 2017-10-10

编辑：Carl 2017-10-10

在 MATLAB Online 中打开

Hi Ashish. I see from your .mat files that you have analogous matrices of type 'cell' and 'double'. Just to clarify, is it only slow when performing those operations on the 'cell' data? Or do you see the same performance when operating on the 'double' type matrices as well?

Initial suggestions for improving performance would be to take a vectorized, rather than loop-based, approach to copying data. For example, change:

for k = 1:size(temp_name,2)
    minimal_row(i,k) = temp_name(k);
end

to

minimal_row(i,:) = temp_name;

Or,

for j = 1:size(number_matrix,2)
    if number_matrix(i,j) == 0
        break;
    else
        temp_name(j) = number_matrix(i,j) ;
    end
end

to

temp_name = number_matrix(i,:);
temp_name = temp_name(temp_name ~= 0)

This type of syntax can be optimized more efficiently.

Ashish Keshkamat 2017-10-11

Hello Carl, It is relatively better when I solve using 'double' type matrices as compared to cells. But,the difference is not appreciable.

Thank you for your feedback. Let me try the vectorised approach.

请先登录，再进行评论。

请先登录，再回答此问题。

Answer 1

Ashish Keshkamat 2017-11-14

1
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/360035-how-to-perform-row-operations-on-a-large-matrix-or-cell-efficiently#answer_291057

Hello All,

Thank you for your time and effort to help me solve this problem. I have figured it out to solve the 'minimize' more efficiently. This function is part of a bigger program I am working on, and was the bottle-neck. There is a pattern in which the input to 'minimize' is generated. I track the duplicates in while generating the input and only compare those rows and delete the super-sets among them. This reduces the computations from thousands to hundreds and sometimes even to tens.

Carl, Thanks for the suggestion of vectorization. I used it in many instances.

Sharath, Thank you for pointing out the flaws in my Algorithm. It helped me to think in different ways to reduce the number of computations.

With all the optimization done, I am able to get my output using input of the order 8656x4 in 20 seconds.

Best Regards, Ashish

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

Answer 2

Sharath Chandran 2017-10-11

2
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/360035-how-to-perform-row-operations-on-a-large-matrix-or-cell-efficiently#answer_285208

编辑：Sharath Chandran 2017-11-3

Hi Ashish,

I reckon, this has more to do with your Algorithm.

Let us consider a matrix of dimension 'm x n' ('m' rows & 'n' columns). The time complexity of the first operation that you perform is O(m * n). This roughly equals to '34624' number of computations which is reasonable number.

On the other hand the time complexity of the second operation that you are performing is O(m^2 * n), which roughly equals to at least '299705344' number of computations, after plugging in the value 'm=8656' and 'n=4'. This is clearly the bottleneck of your current algorithm and hence it needs to be optimized.

Alternate approach:

1. After performing step 1, store each row in a Hash-Map - O(m) (let us ignore the time-complexity of storing data in Hash-Map for the time being).

2. For every row, generate subsets and check if any subset is present in the Hash-Map. This should not be an overhead as max. no. of subsets for a 4-element row is 2^4-2 = 14. Here we are subtracting '2' as we are not considering the empty set and the set itself - O(2^n).

3. If any subset is already present in the Hash-Map, reject this set (row) since one of its subset already exists.

4. Total time complexity - O(m * 2^n). This roughly equals to at least '138496' number of computations which is a drastic improvement.

I hope this was helpful.

Please comment below in case you have any queries.

_

Regards,

MathWorks Technical Support Team

Note: This is not the exact time complexity analysis as we have not taken into account the complexity of storing data in the Hash-map.

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

Ashish Keshkamat 2017-11-7

Hello Sharath,

Thank you for your feedback. I understand the proposed algorithm reduces the number of computations drastically. But, I'm not aware of the concepts of using Hash-map (Hash-table).

Let me see what can be done.

请先登录，再进行评论。

How to perform row operations on a large MATRIX or CELL efficiently?

2 个评论
显示无隐藏无

采纳的回答

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

更多回答（1 个）

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

另请参阅

类别

标签

Community Treasure Hunt

How to perform row operations on a large MATRIX or CELL efficiently?

2 个评论 显示 无隐藏 无

采纳的回答

0 个评论 显示 -2更早的评论隐藏 -2更早的评论

更多回答（1 个）

1 个评论 显示 -1更早的评论隐藏 -1更早的评论

另请参阅

类别

标签

Community Treasure Hunt

2 个评论
显示无隐藏无

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

1 个评论
显示 -1更早的评论隐藏 -1更早的评论