You could speed this up by pre-allocating memory for A if you know the size of b:
A=nan(1000000,5,100); % cumulation matrix
parfor ii=1:100 % or more than 100
A(:,:,ii)=some_function(ii); % b is very tall, say it has dimension of 1000000X5 or taller
end