Thank you all for your comments and time.
Referring to the following updated code, was able to improve the code speed from 0.25msec to 0.14msec by using the vector indexing.
Note iter of 1000 is used to get more accurate timing for single run.
Plots are used for data verification.
close all;
clear;
%A=rand(280,280);
A = double(imread('AT3_1m4_01.tif'));
A = A(1:1:280,1:1:280)/256;
B=zeros(35,35,64);
%%%%Approach#1 iteration
numIter=1000;
tic
for iter=1:numIter
for i=1:1:8
for j=1:1:8
indx1=(i-1)*8+j;
B(:,:,indx1)=A(i:8:end,j:8:end);
end
end
end
toc
disp('Approach#1 time: single iter')
disp(toc/numIter);
figure(1); montage(B);
%%%%%%%%%%%%%%%%%%%%%
%%%%Approach#2 vector indexing
tic
for iter=1:numIter
rngY=1:280;
rngY_permute=reshape(rngY,8,35);
rngY_permute=rngY_permute';
B1=A(rngY_permute,rngY_permute);
B2=reshape(B1,35,8,35,8);
B3=permute(B2,[1,3,4,2]);
B4=reshape(B3,35,35,64);
end
toc
disp('Approach#2 time: single iter')
disp(toc/numIter);
figure(2); montage(B4);
figure(3); montage(B4-B); title('\Delta');
max(max(max(abs(B4-B))))