Audio compression using DCT - but i get same size of files after inverse DCT

10 次查看（过去 30 天）

Mohamad 2018-5-4

0
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/398982-audio-compression-using-dct-but-i-get-same-size-of-files-after-inverse-dct

评论： Abid Ali 2020-4-30

my_Audio2.m

Hi I have a file ( 1.wav) - I'm trying to compress the first two seconds for this audio by using Discrete cosine transform . I attached the code , but when i use the command ( whos ) for the original samples and reconstructed samples after inverse DCT i get the same size and number of bytes So any explanation , and how i get the compression ratio ?

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

请先登录，再回答此问题。

采纳的回答

Walter Roberson 2018-5-4

0
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/398982-audio-compression-using-dct-but-i-get-same-size-of-files-after-inverse-dct#answer_318619

编辑：Walter Roberson 2018-5-4

That is expected. You are writing out the re-expanded data as samples. There will be the same number of samples as before, so it is going to take the same output size (probably.)

See also my recent discussion at https://www.mathworks.com/matlabcentral/answers/398289-how-can-i-do-audio-compression-using-huffman-encoding#comment_563731 . For DCT you would not need to write out a dictionary, but you would not write out the coefficients you had zeroed out. You would, however, need to write out the original number of coefficients so when you read the values in, you knew how many zeros to pad with before reconstruction.

28 个评论
显示 26更早的评论隐藏 26更早的评论

Walter Roberson 2018-5-5

With regards to the file size: you did not write using ubit1 like I said was needed.

With regards to the "Warning: Data clipped when writing file.":

Once you have quantized the DCT coefficients, if you were to then immediately idct() the quantized coefficients, without having removed any coefficients and without having gone through the huffman and file and huffman decode -- just straight dct, quantize, idct of quantized coefficients -- then it turns out that the range of reconstructed values is not -1 to +1 and instead can be like -2.7 to +3.7. This is a pure effect of quantization with dct, and you are going to need to account for it.

My tests show that the idct of the quantized value can be a factor of 10^4 or more higher than the original signal. The parts that seem to do especially poorly are the parts of the signal that have near silence: the reconstructed values can end up fairly large there (I do not know why that might be so.)

When you zero out the extra coefficients, then the reconstructed value can be about -5 to +4.5 . And remember that it is the places of near silence that are especially badly reconstructed (on relative terms), so this introduces noticeable noise into the reconstruction.

Walter Roberson 2018-5-6

The greatest source of noise with that many coefficients is that you are doing the idct of the full dsig, which is the result of the huffmandeco on the data read in as ubit1 . As I described to you before, when you read using ubit1, a full byte is read at the end, leaving you with up to 7 extra 0 bits at the end. When you do the huffman decoding, those 7 extra 0 are likely to turn into one or more extra data samples in dsig. Those extra data samples affect the reconstruction audibly.

You need to figure out some way of ensuring that you extract the same length of signal from the huffman decoding as you put into the huffman encoding. I already described one method to you: add a distinct "end of stream" data element, and after decoding, detect that marker and remove from there onward. Another way to handle the situation is to write the length as part of the binary file.

The second greatest source of noise is the zeroing of the low-energy coefficients.

It takes a lot of dictionary entries to counter-act the effect of zeroing the low-energy coefficients. There seems to be an RMS limit of about 1.86 when the coefficients are zeroed, where-as with the coefficients not zeroed, you can get down to about 0.38 with 512 coefficients.

I am still testing what you can do with more coefficients. It turns out that the internal routines that validate the dictionary are inefficient, involving operations proportional to the square of the number of entries, so there are practical limits in how far out you can test.