How to perform Arthemtic Codding on Nested Cell Aray

Question

GEEVARGHESE TITUS 2017-2-21

0
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/326079-how-to-perform-arthemtic-codding-on-nested-cell-aray

评论： GEEVARGHESE TITUS 2017-2-25

I am having the following cell C, with the associated data

2x1 cell
[29;32]
[0;72]
2x1 cell
[]
[29;31;33;64]
6x1 cell
[]
[0;11;14;15;20;22;45;53]
[0;13;16;17;34;47]
[0;18;21;33]
[0;10;15;16;17]
[0;10;14;24;31]
18x1 cell
[]
[]
[]
[]
[]
11x1 int8
13x1 int8
[0;10;11;13;15;16;18;21;22;33]
16x1 int8
[0;10;11;13;15;20;23;24;26]
[0;10;11;13;14;16;18;25]
0
[0;14;15]
[0;11;13;14;16;20;21;23]
[0;11;13;15;21]
[0;10;11;12;14;17;19;20;23]
[0;10;11;12;13;15;16;18;20]
[0;10;11;12;14;15;19;20;25]

How can we apply Arithmetic coding of the above cell C. I tried to do Ac on each cell, but it is ending in error. How can we retrieve the unique symbols from all the cell and its count, so we can run the AC function, without effecting the cell structure. Also how can be do the decoding and retrieve the cell back?

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

请先登录，再回答此问题。

Answer 1

Walter Roberson 2017-2-21

0
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/326079-how-to-perform-arthemtic-codding-on-nested-cell-aray#answer_255632

在 MATLAB Online 中打开

inner_layer = @(inner) AC( length(inner), unique(inner), inner );
middle_layer = @(middle) cellfun(inner_layer), middle, 'uniform', 0);
result = cellfun(middle_layer, C, 'uniform', 0);

21 个评论
显示 19更早的评论隐藏 19更早的评论

Walter Roberson 2017-2-21

在 MATLAB Online 中打开

The bottom line is iterating over all of the various cells that you list, picking out each of them in turn and handing it to middle_layer. So for example picking out the 2x1 cell and the 6x1 cell and handing that to middle_layer.

But each of what the last line picks out is a cell, and you need to work with the individual elements of that cell. So middle_layer iterates over each of member of the current cell and hands the content to inner_layer. So for example it will pick apart the {[]; [29;31;33;64]} and hand inner_layer [] in one call and [29;31;33;64] in another call.

inner_layer receives one column vector (possibly empty) and needs to do the arithmetic encoding. You named that AC and you said the function needed the length and the unique entries, so inner_layer calculates the length and the unique symbols and pass those and the vector itself into whatever AC does.

inner_layer can the handle to any function that has a vector as input and does something with the vector. For example,

inner_layer = @(inner) arithenco( sum(bsxfun(@eq, unique(inner), inner(:).') .* repmat((1:length(unique(inner))).', 1, length(inner))), sum(bsxfun(@eq, unique(inner), inner(:).'),2));

... and, No, I do not expect you to understand that code. It is obscure code to do by a single anonymous function something that would be much easier with a two or three line true function.

This code assumes that every inner cell is to be encoded separately, without reference to the other cells. The statistics (second parameter to arithenco) are all going to be 1 in the data you show: none of the cells you show have any repetition as would be needed for non-equal statistics. Indeed, your cells are entirely sorted, and since arithenco does not preserve symbol identity, all of your entries of the same length are going to produce the same outputs...

GEEVARGHESE TITUS 2017-2-21

编辑：GEEVARGHESE TITUS 2017-2-21

在 MATLAB Online 中打开

Thank you very much for such elaborate explanation. I ran the code and changed AC to arithenco.

inner_layer = @(inner) arithenco(unique(inner), length(inner),  inner) ;
middle_layer = @(middle) cellfun((inner_layer), middle, 'uniform', 0);
result = cellfun(middle_layer, c, 'uniform', 0);

But i am getting the following error

Error using arithenco
Too many input arguments.
Error in @(inner)arithenco(unique(inner),length(inner),inner)
Error in @(middle)cellfun((inner_layer),middle,'uniform',0)

Is it the use of brackets as arithenco function takes in two arguments. cellfun for the first line was missing so i modified it as

inner_layer = @(inner) cellfun(arithenco(unique(inner), length(inner)),  inner) ;
middle_layer = @(middle) cellfun((inner_layer), middle, 'uniform', 0);
result = cellfun(middle_layer, c, 'uniform', 0);

Now the error is

Error using arithenco>errorchk (line 175)
The symbol sequence parameter must be a vector of positive finite integers.
Error in arithenco (line 33)
errorchk(seq, counts);
Error in @(inner)cellfun(arithenco(unique(inner),length(inner)),inner)
Error in @(middle)cellfun((inner_layer),middle,'uniform',0)

Also based on the data, some values are repeating a lot of times within each cell and also in other cells. In such cases, if there are more repetitions can i use the code or do we need to modify it?

Walter Roberson 2017-2-22

在 MATLAB Online 中打开

arithenco must be given non-negative integer values for the symbol list (first argument.)

In the special case where your floating point values do not exceed +/- 340282346638528859811704183484516925440 and require no more than 23 bits of precision, you can use

sym_idx = 1 + double(typecast( single(symbols), 'uint32' ));

Just make sure that you provide a count vector that is at least max(sym_idx) long. For example if symbols = -pi then sym_idx would work out as 3226013660 and you would need ones(1,3226013660) for your count vector.

You can extend this to double precision by taking into account that not all bit patterns in double represent valid numbers: for numbers that are actually representable you do not need a vector of counts longer than 18442240474082181119 entries. Which would unfortunately require more memory than is supported by any publicly known x64 chip, and many many many times more memory than exists in the world at present, but it might be worth arranging to get new processors and memory fabricated to avoid having to transmit the symbol table.

Walter Roberson 2017-2-23

编辑：Walter Roberson 2017-2-24

在 MATLAB Online 中打开

[unique_values, ~, index] = unique(values)

is such that unique_values(index) = values . That is, it computes the unique entries and tells you what index into that you have to use to get each respective value. The unique entries will be numerically sorted.

In the case of your samples such as

[0;10;11;13;15;16;18;21;22;33]

everything is already unique and sorted and that is going to be the same thing that would be returned in unique_values, and the index vector returned would be 1:10 . Hardly worth computing unless you do have duplicates.

sym_idx = 1 + double(typecast( single(symbols), 'uint32' ));

well, first it creates a single precision number from the value. So if the entry was int8(-72) the value would be converted to single(-72) instead.

Single precision numbers are 4 bytes long. typecast( single(symbols), 'uint32' ) extracts the 4 bytes as an unsigned 32 bit integer. It does not do any calculation for this: it just changes the header from saying "this was 1 value of type single, total 4 bytes" to "this is 1 value of type uint32, total 4 bytes". The bits don't change: the rules about what you can do with the bits change.

The result with bit a 32 bit unsigned integer, range 0 to 2^32-1. Then you double() that, which converts it into a 64 bit double precision number that happens to be integral. The range is 0.0 to (2.0^32-1).

The +1 shifts the range from start from 0 to instead start from 1. So the value will be 1.0 to 2.0^32

The hexadecimal representation of the floating point number -72.0, num2hex(-72), is c052000000000000 ( see here ) which is the bytes 192, 82, and then 6 bytes of 0, if you examine the bytes from the most significant byte to the least significant byte. It happens, though, that your computer (the x86 and x64 architecture) stores numbers in memory in the opposite order, least significant byte first. Looking at [192 82 0 0 0 0 0 0] as least significant byte first is [0 0 0 0 0 0 82 192], which is what you saw when you typecast to uint8 .

In short, [0 0 0 0 0 0 82 192] is the decimal representation of the bytes in memory that, interpreted a different way, could also be called the double precision number -72.0

This is a fine theoretical transformation: every double precision number can be uniquely mapped to a single unsigned 64 bit integer. Using this, you do not need to need to find the unique values, and you do not need to send around any symbol table saying something like "entry #11 is -72.0".

But as a practical transformation, it blows bubbles. You were supposed to get the clue from the 3226013660 being the result for -pi -- there is no way you want to send around a vector of 3226013660 counts just to be able to decode properly.

The practical transform is to unique() and count repetitions of what is actually used and to transmit the symbol table.

For longer streams of data, there turns out to be an even more practical approach. typecast() all your data to uint8, like you were exploring. And then instead of working with sequences of floating point numbers, work with the sequences of bytes. If there is even marginal variation in the bytes, the entire set of bytes 0 to 255 is likely to get used. So you do not need a symbol table of the used bytes: you just assume that they all get used. And you transmit the counts instead, which is something you had to do anyhow. This turns out to be effective at compressing because floating point numbers in an array are seldom completely random: they tend to stay within a small number of orders of magnitude. For example, the starting byte 192 (like above) covers the range from -2 to roughly (-131072 plus 1.5E-11).

You can work out mathematically the trade-off point between sending a vector of individual double precision numbers (once per unique number) and indices into that list, and counts for each -- versus converting to bytes, assuming that all bytes will be used, making (1+byte_value) the fixed index for the byte, and sending the counts for all of the 256 bytes. In the symbol table version, each floating point number requires 8 bytes to transfer. But it can be worth it in fairly skewed situations.

Walter Roberson 2017-2-25

The question has changed enough that I recommend creating a new Question on the topic with more detail on what you are looking for in this phase

GEEVARGHESE TITUS 2017-2-25

Ok I will post it as a new question.. Thank you

请先登录，再进行评论。

How to perform Arthemtic Codding on Nested Cell Aray

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

采纳的回答

21 个评论
显示 19更早的评论隐藏 19更早的评论

更多回答（0 个）

另请参阅

类别

标签

Community Treasure Hunt

How to perform Arthemtic Codding on Nested Cell Aray

0 个评论 显示 -2更早的评论隐藏 -2更早的评论

采纳的回答

21 个评论 显示 19更早的评论隐藏 19更早的评论

更多回答（0 个）

另请参阅

类别

标签

Community Treasure Hunt

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

21 个评论
显示 19更早的评论隐藏 19更早的评论