How to get starting and ending limits of each silence interval?

Question

ayesha jabeen 2016-8-23

0
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/300537-how-to-get-starting-and-ending-limits-of-each-silence-interval

编辑： Walter Roberson 2017-10-4

I have a code in which a Boolean vector(vad) the same length as the audio returns '1' for audio and '0' for silence. Now i want to get limits of each starting and ending silence interval... for example if silence bits start from 2 to 5 and 7 to 9 then i want to get this 2 to 5 and 7 to 9 limits. By using voicebox function activlev i get this logical array vad. http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/doc/voicebox/activlev.html

Can anyone help me?

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

请先登录，再回答此问题。

Answer 1

Walter Roberson 2016-8-23

1
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/300537-how-to-get-starting-and-ending-limits-of-each-silence-interval#answer_232522

在 MATLAB Online 中打开

There is a trick:

begin_positions = strfind(vad, [1 0])
end_positions = strfind(vad, [0 1])

you will probably need to adjust the boundary by + or - 1 for your purposes.

Beware the edge cases: plan ahead what you want to do if the data starts with silence (or starts with non-silence)

33 个评论
显示 31更早的评论隐藏 31更早的评论

Walter Roberson 2016-8-28

Does your detection code look only for exact 0's, or does it look for regions of low intensity? If it looks for exact 0's then you are typically not going to find very many of them because real-world applications are going to have background noise and air noise and breath pops and the like. If you define "noise" as anything non-zero then any algorithm you use that changes the 0's to non-zero is going to introduce "noise" according to your definition.

It would be more typical for these kinds of applications to be looking at embedding information so that when played back, the change would not be perceptible to a human. Which is very different than concentrating on exact 0's. For example, you can hear a mosquito buzzing from about 3 metres away in a quiet room (0 dB) but if you were to put the same mosquito buzz into the opening movement of Beethoven's Fifth, then humans would be unlikely to notice.

You need to define your objectives more precisely. Are you interested in human perception, or are you trying to avoid being detected by machine analysis?

Walter Roberson 2016-8-29

When you are trying to encode changes to be unperceptible, and if you are encoding strictly in periods of silence, then your changes need to be either at a higher frequency than humans perceive, or at a lower frequency than humans can perceive. Low frequency changes require rewriting the entire audio stream, not just isolated groups of bits, so you need to concentrate on high frequencies.

The upper limit on human perception depends on the individual and declines with age, with fairly few surpassing 20 kHz sensitivity, but healthy teenagers might reach 18 kHz. I see 12 kHz mentioned as being a pretty typical upper bound for adults. Reproducing 12 kHz requires a 24 kHz sampling rate, a little above the common 22100 Hz audio sampling rate, but below the 44200 Hz sampling rate used for good quality CDs. If your audio sampling rate is 8000 Hz or 9600 Hz then any repeated change you can make runs the risk of being perceived. Roughly from about 20000 Hz sampling rate and up you could probably "get away" with changing single isolated bits, and at some point (I don't know exactly, maybe around 35000 Hz) you could probably "get away" with changing two bits in a row.

On the other hand, if you have very short trains of bit changes within a longer fragment, then in theory that corresponds to high frequency changes (think of the fourier transform of a square wave), so in longer samples you might be able to pack changes closer together than is implied by the straight short-time analysis. You would need to research how long a human needs a sound to be present to perceive the sound.

The study of how humans perceive sounds in context leads to a field called "perceptual coding". You do not need silence to encode data in, provided that you code into fragments that humans will not notice. Human ears are not perfect frequency analysis tools, detecting each frequency in isolation. There is inertia, vibrations take time to dampen away, and so on. When you work at the level of encoding strictly into silence then you are restricting yourself to standard Nyquist analysis of what would be perceptible in isolation; if you start packing closer together because "people don't seem to notice" then you are getting into perceptual matters. If you are changing more than one bit in every 24000 or so samples then you are dealing with perceptual differences.

One further aspect you need to consider is that when you write data to an audio file, the audio file encoding might use lossy compression, perhaps based upon perceptual encoding. This would definitely be the case for writing mp3 files for example. Your careful tweaks to the bitstream could get undone if you write to a lossy audio file format.

ayesha jabeen 2016-9-18

编辑：Walter Roberson 2016-9-18

在 MATLAB Online 中打开

This is the code for LSB hiding

[y,fs,nbits,opts]=wavread([handles.pname handles.fname],[1 2]);
   %open a wav file for hidding text
   fid1=fopen([handles.pname handles.fname],'r');
     %first 40 bytes make wav header,store the header
     header=fread(fid1,40,'uint8=>char');
     %41st byte to 43rd byte,length of wav data samples
     data_size=fread(fid1,1,'uint32');
     %copy the 16 bit wav data samples starting from 44th byte
     [dta,count]=fread(fid1,inf,'uint16');  
     %close the file, only wav data samples are sufficient to hide the text
     fclose(fid1);
     lsb=1;
     msg=get(handles.edit1,'string');    %get text message from editbox
     [ro,co]=size(msg);
     if ( (ro*co*8+28) > count )
       msgbox('Message too big, select small message','Empty');
     else
         [m_msg,n_msg]=size(msg);
         msg_double=double(msg);             %convert it to double
         msg_bin=de2bi(msg_double,8);        %then convert message to binary
         [m,n]=size(msg_bin);                %size of message binary
         msg_bin_re=reshape(msg_bin,m*n,1);  %reshape the message binary in a column vector  
         m_bin=de2bi(m_msg,10)';          %
         n_bin=de2bi(n_msg,10)';          %
         len=length(msg_bin_re);       %length of message binary
         len_bin=de2bi(len,20)';       %convert the length to binary
         %hide identity in first 8 wav data samples.
         identity=[1 0 1 0 1 0 1 0]';
         dta(1:8)=bitset(dta(1:8),lsb,identity(1:8));
         %hide binary length of message from 9th to 28 th sample
         dta(9:18)=bitset(dta(9:18),lsb,m_bin(1:10));
         dta(19:28)=bitset(dta(19:28),lsb,n_bin(1:10));                              
         %hide the message binary starting from 29th position of wave data samples
         dta(29:28+len)=bitset(dta(29:28+len),lsb,msg_bin(1:len)');
         randname=num2str(randint(1,1,[1 2000]));
         %open a new wav file in write mode
         fid2=fopen(['new' randname '.wav'],'w');
         %copy the header of original wave file
         fwrite(fid2,header,'uint8');
         fwrite(fid2,data_size,'uint32');
         %copy the wav data samples with hidden text in new file
         fwrite(fid2,dta,'uint16');
         fclose(fid2);
         msgbox(['Your text is hidden in  new' randname '.wav file'],'');

And the silence intervals i have find out are have written below

Walter Roberson 2016-9-18

You cannot accomplish that task by modifying only silence when the sampling frequency is 4410 Hz (I've never seen that one! 4000 yes, but never 4410!) or 16000 Hz.

https://en.wikipedia.org/wiki/Hearing_range#Humans

"The commonly stated range of human hearing is 20 Hz to 20 kHz. Under ideal laboratory conditions, humans can hear sound as low as 12 Hz and as high as 28 kHz, though the threshold increases sharply at 15 kHz in adults, corresponding to the last auditory channel of the cochlea."

Since your target is to be imperceptible in the silence, you should not be creating any frequency difference less than 15 kHz, which requires a 30000 Hz sampling frequency. Above that could perhaps be forgiven given the sharp drop off mentioned above. Each gap of silence would need to be considered separately, and no change can be considered for the gap unless you are changing less than 1 sample in 30000 at that 30000 Hz -- so you cannot make any change unless the gap is at least 30000 samples long (you would need to calculate the correct minimum gap size for higher sampling frequencies.)

Looking at your start and end positions, it is obvious that none of your gaps are even remotely close to 30000 samples: your largest is 10420 samples. A change to even one sample in that 10420 would induce a frequency change in the 5 kHz range -- a range that humans are particularly sensitive to (2 kHz to 5 kHz).

Your task is doomed to failure unless the sound file has longer silences -- or unless you give up on relying on Nyquist theory and instead switch to perceptual coding theory, modifications to the sound that humans will not notice because of the way the auditory cortex processes sounds in context.

Your entire method of reading the samples is dubious. You are assuming that the samples are stored as direct 16 bit unsigned integer values. That will not generally be the case. WAV files can have various forms of compression, and if the sampling depth is less than 16 bits, multiple samples might be present per 16 bit word.

Walter Roberson 2016-9-18

在 MATLAB Online 中打开

I looked at the size of your intervals of silence. They varied, so I took the first one as representative; it was 3565 samples long. So I decided to experiment with 3500 samples, each with the bottom bit possibly changed:

def = randi([0 1],3500,1,'uint16');
DEF = double(typecast(def,'int16'))/32768.;

At the normal maximum volume of my laptop (which is not all that loud),

sound(DEF, 16000)

produced no audible sound.

sound(DEF*20, 16000)

was just perceptible, and

sound(DEF*50, 16000)

was clearly perceptible.

But in a context like this, multiplying by a constant is just volume control: for me to say that sound(DEF*50, 16000) was clearly perceptible in my configuration but sound(DEF, 16000) was not, is not a matter of frequency range, just a matter of how loud I had it turned up. The frequencies were audible to humans (it comes out as a burst of noise similar to white noise.) If the source material was, for example, a quiet meadow with bird calls and insects, and the person had the volume turned up, then the burst of induced noise from the hidden bits would be obvious. With lower volume (for example, the source was some talking or some heavy metal music and the person did not want to damage their ears) then the noise might not be perceived -- but that is not because it would be inherently imperceptible, it would just be low volume.

If you want the modifications to be imperceptible because they are beyond the human hearing range, then you need to decide on a cutoff frequency, Fc, such as the 15000 mentioned above, and then you cannot modify more than 1 bit in every Fc*2 samples (for 44100 Hz that would be roughly 2/3 of a second; for 16000 Hz that would be roughly 2 seconds.)

As you do not appear to have silence gaps anywhere near that long, if you want to encode data into the silence intervals, you are going to need to change your audibility goals.

One thing I am not clear on is how the decoding mechanism is going to know where to look for the hidden bits. Any bits you modify are no longer going to be 0, so unless you encode their positions in a known location, you are not going to be able to tell they were originally silence. Not, that is, unless you redefine silence to be samples for which all except possibly the LSB are 0, as that property does not change after embedding data in the LSB.

ayesha jabeen 2016-10-3

Sir i have hide data by using LSB method.Data covered successfully in audio but when video is reconstructed by using this audio and after that extract data there is no hidden data in file.I can not understand why this problem is happening?What could that compress the file and due to this compression data is exiting from audio?tell me how do I solve this problem?

Walter Roberson 2017-7-10

How are you saving the audio data into the video ? How are you constructing the video object when you do that?

请先登录，再进行评论。

How to get starting and ending limits of each silence interval?

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

采纳的回答

33 个评论
显示 31更早的评论隐藏 31更早的评论

更多回答（0 个）

另请参阅

类别

标签

Community Treasure Hunt

How to get starting and ending limits of each silence interval?

0 个评论 显示 -2更早的评论隐藏 -2更早的评论

采纳的回答

33 个评论 显示 31更早的评论隐藏 31更早的评论

更多回答（0 个）

另请参阅

类别

标签

Community Treasure Hunt

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

33 个评论
显示 31更早的评论隐藏 31更早的评论