I have few questions regarding string. I have applied Run Length Encoding on a large string. But at the output I am getting some unwanted symbols. why it is so?
1 次查看(过去 30 天)
显示 更早的评论
function y = estring(str)
len = numel(str); %65536
i = 0;
count = zeros(1,len);
y=[];
while( i<len )
j=0;
count(i+1) = 1;
while( true )
j = j + 1;
if( i+j+1 > len )
break;
end
if( str(i+j+1)==str(i+1) )
count(i+1) = count(i+1) + 1;
else
break;
end
end
if false
a=str(i+1);
length(a);
y = [y a];
i = i + 1;
else
a=str(i+1);
b=count(i+1);
y =[y a b];
i = i + b;
if(count==1)
y=[y a b]
end
end
end
1 个评论
回答(1 个)
Walter Roberson
2015-5-12
I already answered this in a previous discussion. Every second character of your returned string is the char() equivalent of a binary count. Remember, if you have ['P' 9] the result is not 'P9', it is 'P' followed by char(9) which happens to be the tab character. If you want to have 'P9' as the result when the count is 9 then you need to program the code that way and you need to decide exactly what you want to have happen if you get more than 9 in a row of the same thing.
3 个评论
Walter Roberson
2015-5-12
You do not represent counts with a whitespace. You represent counts with the character whose binary value is the count. If that count happens to be (for example) 42, then you are going to get char(42) which is '*'. And if the count happens to be 116 then you are going to get char(116) which happens to be 't' and you won't be able to tell that apart from a normal 't' of your output.
If you want to output the string without the counts then output every second character... like I already showed you. Or, encode the counts as printable digits and accept that a count of 10 will take more characters to represent than a count of 9, and get smarter about decoding the compressed string.
You need to define: exactly what string should be output to run-length encode (for example) 'PPPPPPPPPPPtPP'. A completely valid answer is ['P' char(11) 't' char(1) 'P' char(2)] which is what you are generating now. It is a valid run-length encoding. You just have to be aware that in that particular encoding every second character is a binary count. You also have to be aware that binary counts from 256 to 65535 imply that you are storing two bytes per character (counts larger than that would give an error unless you were careful) whereas a maximum count of 255 would allow you to store only 1 byte per character. (Remember in that other post about compression ratios earlier tonight I spoke of the difference between the number of bytes of storage per location and the number of "used" bits of storage per location? This is a case where it makes a difference, as MATLAB stores 2 bytes per character in memory.)
There are other run-length encoding schemes, some of which only use printable characters. If you want efficiency in run-length encoding you normally work in binary rather than in printable characters. But even if you restrict yourself to printable characters you can get higher efficiency than a scheme of letter followed by a sequence of decimal characters '0' through '9' that represent counts. An important part of that is to define your allowed output characters. See for example Base64
另请参阅
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!