Your code works for the padded case. Comparing the results to the default syntax for conv2:
example=conv2(a,b);
the MSE for my given test image and filter is in the ballpark of 1E-32. I'd say that's close enough to zero.
If you want your code to match the behavior of this syntax (edit: oh, you did say that.)
example=conv2(a,b,'same');
then you could either crop off the excess, or just do convolution over the reduced area to begin with:
[m,n] = size(a);
[m1,n1] = size(b);
mn = [m,n] + 2*([m1,n1]-1);
a0 = zeros(mn);
a0(m1:(end-m1+1),n1:(end-n1+1)) = a;
b1 = rot90(b,2);
b2 = b1(:);
os=floor([m1 n1]/2); % your offset is basically the filter radius
out = zeros(m,n); % output size is the same as input
for ii = 1:m
for jj = 1:n
x = a0((ii:ii+m1-1)+os(1),(jj:jj+n1-1)+os(2));
out(ii,jj) = x(:)'*b2;
end
end
Again, for my test image and filter, the error is negligible. This matches for both odd and even filter sizes.
