Reduce memory use when doing element-wise multiplication of repeated (repmat/repelem) matrices? (Final product is more sparse than intermediate computations.)

Question

pedro gaseoso 2022-7-14

1
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/1760250-reduce-memory-use-when-doing-element-wise-multiplication-of-repeated-repmat-repelem-matrices-fin

编辑： Matt J 2022-7-26

ABC.mat

Re-Edit: I realized I could use kronecker products in place of each repmat and repelem call. I did this so that I could decompose ABC_output into the following:

ABC_output2 = kron(kron(A,ones(num_s3)).*kron(ones(num_s1,num_s2),B),ones(num_s4)).*kron(ones(num_s1*num_s2,num_s2*num_s3),C);

But Matlab balks on memory, and it's slow as.

Yes, I am aware that you can swap Hadamard and Kronecker products in the case of matrices having the same dimension, but unfortunately, this isn't the case. If it was, I could simplify the above equation greatly and do all of the Hadamards first and one Kronecker later.

-----

I have three matricies, A, B, and C, whose elements are repeated (repmat/repelem) according to a specific pattern, and then element-wise multiplied together. Using the attached .mat file, the following code will do this:

% these four numbers relate to the discretization of four variables

num_s1 = 11;

num_s2 = 15;

num_s3 = 19;

num_s4 = 26;

% here we calculate two intermediaries (to see memory requirements) and the desired result

AB_repeated = repelem(repelem(A,num_s3,num_s3).*repmat(B,num_s1,num_s2),num_s4,num_s4);

C_repeated = repmat(C,num_s1*num_s2,num_s2*num_s3); % particularly large

ABC_output = AB_repeated.*C_repeated; % smaller memory requirement than intermediates

This is fast and the code is easy to understand if the pattern--which is implicit in the last three lines--is known. For reference: we are creating a transition matrix that tracks the evolution of three state variables (output across columns), and takes the values of the three states as well as a shock variable as inputs (input across rows). Each row represents a composite state of these four variables (and three for the columns). The matrix is structured so that as we go down the rows, s4 loops through its state levels first, holding s1-s3 constant, then when s4 resets we increment s3 and loop through s4 again... then once we loop all the way through s3 we would increment s2, etc. The pattern exists in the column direction as well, without s1. This results in the repelem/repmat usage above, and I get my desired result.

As you can see from running the code, the intermediates are massive relative to the final output. This is because the final output is much more sparse because the sparse-pattern of each component matrix is different. I believe there must be a way to take advantage of this structure and not waste memory in the middle of computation. At the moment, I have to keep s1-s4 small, and I should be able to increase these significantly if I can reduce the intermediate step to require only as much ram as the final output.

Any ideas would be appreciated. This code creates an input to a dynamic programming problem, which may suffer from discretization error because of the current coarseness of this transition matrix stemming from the memory-intensive intermediate computation. I can verify/reject this claim by making this step more efficient. Thank you for your help.

7 个评论
显示 5更早的评论隐藏 5更早的评论

pedro gaseoso 2022-7-18

Thanks for updating this again Bruno, and for your help @Matt J. I realized I could decompose the original lines a different way, which is now:

ABC_out = kron(A,kron(ones(1,num_s3),C)) .* kron(kron(ones(num_s1,num_s2),B),ones(num_s4));

or

, where F is ones(1,s3), D is ones(s1,s2), and E is ones(s4),

which is faster (actually, nearly instant on my 7yo laptop) b/c fewer operations and takes best advantage of the differences in sparse patterns by kron'ing A and C first. Instead of losing ~2.6Gb of memory momentarily, I'm down to ~1.3Gb (for the current resolution of the problem).

I can't take advantage of any mixed products here either, unfortunately. But I doubt I can do any better with my particular approach--it would be nice if there was some way to optimize this combination of Hadamard/Kronecker products further without fully-departing from the simplicity of the above line--but I just exhausted my last planned idea. @Bruno Luong, I don't know if this alternative decomposition simplifies your approach--I think it would remove the need for one of the stimes commands.

pedro gaseoso 2022-7-20

@Catalytic most operations are that, yes. Like vec*ABC or ABC*vec. I also invert a function of this matrix at some point (via gauss, not the actual inverse). And there are four such matrices ABC1...ABC4 that depend on a control variable, and at some point I know an answer to a dynamic programming problem and use that to combine the relevant elements from the four ABCs.

pedro gaseoso 2022-7-20

@Bruno Luong, I get that is what is happening, but the "stimes" function is too opaque for me. I feel like there is an easier way to get at my idea but I keep failing to this:

I have row/col/value tuples for each matrix using something like "find". The row indices are a combo of the s1 and s2 states, just like the row indices of the ABC matrix represent a combination of the s1,s2,s3,s4 states.
For each matrix A,B,C, I should be able to expand the indices from the find step above. All I need to do is find the row/col pairs according to the four-state index that are compatible with the row/col pairs from find.
I then just look to see if the row/col pairs of ABC are implied by all of A, B, C, and keep the intersection. For these cases, I multiply the corresponding matrix values.
sparse(row, col, val)

I'm struggling with an implementation of this, though.

请先登录，再进行评论。

请先登录，再回答此问题。

Answer 1

Matt J 2022-7-18

0
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/1760250-reduce-memory-use-when-doing-element-wise-multiplication-of-repeated-repmat-repelem-matrices-fin#answer_1009475

在 MATLAB Online 中打开

out=kron(X,Y);

5 个评论
显示 3更早的评论隐藏 3更早的评论

Matt J 2022-7-18

编辑：Matt J 2022-7-18

You haven't told us how the num_s? are related to the sizes of A,B, and C

pedro gaseoso 2022-7-18

Ah, my oversight from trying to simplify the problem for here. Matrix A, B, and C all detail the evolution of a different state variable. Thus rows will involve the current state level and columns will represent the future index, and values in the matrix are probability weights for each state transition.

s2, s3, and s4 are the number of discrete values the discretizations of the three state variables can take. For example, s4 implies that this variable can take on 26 values. Additionally, s1 is the discretization of a fourth shock variable, which has a value that is a random draw each period and thus does not have to be kept track of (its transition matrix would just have the same probability distribution repeated for every row).

So in matrix A, you could verify that the dimension is s1*s2 by s2, and what this implies is that the value of the shock variable and the first state variable are needed to calculate the future state distribution. Similarly, B is s2*s3 by s3 and C is s3*s4 by s4. And this is where the kronecker rub is. Because of the "overlap" of these matrices in the row dimension, we can not simply iterate kron in order to fold all of the states together nicely. What we need to do is expand each matrix to the size s1*s2*s3*s4 by s2*s3*s4, while preserving an (arbitrary) pattern that maps a particular tuple (state1,state2,state3,state4) -> combinedindex1234 (you can see this ordering in the updated ABC.mat file). Then we can element-wise multiply these matrices which map current index values to future index values. My original example is a slightly-optimized version of this process. I am wondering if there is an alternative process that would avoid a spike in memory usage though, because while my approach is very fast, it is momentarily wasteful.

请先登录，再进行评论。

Answer 2

Matt J 2022-7-19

0
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/1760250-reduce-memory-use-when-doing-element-wise-multiplication-of-repeated-repmat-repelem-matrices-fin#answer_1010350

编辑：Matt J 2022-7-19

在 MATLAB Online 中打开

or

, where F is ones(1,s3), D is ones(s1,s2), and E is ones(s4),

This solution eliminates the need to form the Kronecker product

. It uses the File Exchange files mat2tiles.

https://www.mathworks.com/matlabcentral/fileexchange/35085-mat2tiles-divide-array-into-equal-sized-sub-arrays?s_tid=srchtitle

%%%Fake data
[A,B,C]=deal(rand(4),  rand([1,5]) , rand(3));
n1 = size(A,1);
n2 = size(A,2);
n3 = size(B,2);
n4 = size(C,1);
%%%% proposed alternative
T1=kron(A,kron(ones(1,n3),C));
[b(1),b(2)]=size(B);
P1=permute( reshape(1:n1*b(1)*n4,[n4,b(1),n1]) , [2,1,3]); 
   P1=P1(:);
   iP1=1:max(P1);iP1(P1)=iP1; %inverse sort
P2=permute( reshape(1:n2*b(2)*n4,[n4,b(2),n2]) , [2,1,3]); 
   P2=P2(:);
   iP2=1:max(P2);iP2(P2)=iP2;
T1=mat2tiles(T1(P1,P2),b);
 sz0=size(T1);
T1=cell2mat(T1(:).');
 sz1=size(T1);
T1=reshape(T1,prod(b),[]).*B(:);
T1=mat2tiles( reshape(T1,sz1),b);
T1=cell2mat(reshape(T1,sz0));
T1=T1(iP1,iP2);
%%%% baseline computation
T0 = kron(A,kron(ones(1,n3),C)) .* kron(kron(ones(n1,n2),B),ones(n4)); 
%%%% compare proposed to baselin
[lowError, highError] = bounds(T0-T1,'all')
lowError = 0
highError = 0

7 个评论
显示 5更早的评论隐藏 5更早的评论

pedro gaseoso 2022-7-25

I marked this as an answer; however, these do not reduce memory usage relative to my one-liner. I've decided to give up on the problem.

Matt J 2022-7-25

编辑：Matt J 2022-7-26

在 MATLAB Online 中打开

however, these do not reduce memory usage relative to my one-liner.

For this reason, I unaccepted the answer. However, here is yet another variant, which I think should use less memory (though it is twice as slow).

load('ABC.mat');
n1 = 11;
n2 = 15;
n3 = 19;
n4 = 26;
tic
%%%% proposed alternative
T1=kron(A,kron(ones(1,n3),C));
        b=size(B);
        
        P1=permute( reshape(1:n1*b(1)*n4,[n4,b(1),n1]) , [2,1,3]); 
           P1=P1(:);
           iP1=1:max(P1);iP1(P1)=iP1; %inverse sort
        P2=permute( reshape(1:n2*b(2)*n4,[n4,b(2),n2]) , [2,1,3]); 
           P2=P2(:);
           iP2=1:max(P2);iP2(P2)=iP2;
        
          T1=T1(P1,P2);
          sz=size(T1);
          
           N=b(2);
           BB=repmat(B,sz(1)/b(1),1);
         for i=1:N
          T1(:,i:N:end)= T1(:,i:N:end).*BB(:,i);
         end
         T1=T1(iP1,iP2);
toc
Elapsed time is 1.152428 seconds.
tic 
%%%% baseline computation
T1=kron(A,kron(ones(1,n3),C));
T0 = T1 .* kron(kron(ones(n1,n2),B),ones(n4)); 
toc
Elapsed time is 0.604501 seconds.

请先登录，再进行评论。

Reduce memory use when doing element-wise multiplication of repeated (repmat/repelem) matrices? (Final product is more sparse than intermediate computations.)

7 个评论
显示 5更早的评论隐藏 5更早的评论

回答（2 个）

5 个评论
显示 3更早的评论隐藏 3更早的评论

7 个评论
显示 5更早的评论隐藏 5更早的评论

另请参阅

类别

标签

Community Treasure Hunt

Reduce memory use when doing element-wise multiplication of repeated (repmat/repelem) matrices? (Final product is more sparse than intermediate computations.)

7 个评论 显示 5更早的评论隐藏 5更早的评论

回答（2 个）

5 个评论 显示 3更早的评论隐藏 3更早的评论

7 个评论 显示 5更早的评论隐藏 5更早的评论

另请参阅

类别

标签

Community Treasure Hunt

7 个评论
显示 5更早的评论隐藏 5更早的评论

5 个评论
显示 3更早的评论隐藏 3更早的评论

7 个评论
显示 5更早的评论隐藏 5更早的评论