How to get starting and ending limits of each silence interval?
1 次查看(过去 30 天)
显示 更早的评论
I have a code in which a Boolean vector(vad) the same length as the audio returns '1' for audio and '0' for silence. Now i want to get limits of each starting and ending silence interval... for example if silence bits start from 2 to 5 and 7 to 9 then i want to get this 2 to 5 and 7 to 9 limits. By using voicebox function activlev i get this logical array vad. http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/doc/voicebox/activlev.html
Can anyone help me?
采纳的回答
Walter Roberson
2016-8-23
There is a trick:
begin_positions = strfind(vad, [1 0])
end_positions = strfind(vad, [0 1])
you will probably need to adjust the boundary by + or - 1 for your purposes.
Beware the edge cases: plan ahead what you want to do if the data starts with silence (or starts with non-silence)
33 个评论
ayesha jabeen
2016-8-23
i want to get each silence segments starting and ending limits so that by using these limits i embed data in silence and strfind function work on single row? i have a multiple rows in which i want to get each staring of zero to nonzero segment
Walter Roberson
2016-8-23
The easiest approach is to loop over the rows. Situations in which you need to extract different amounts of data per row cannot be vectorized (at best you could hide the looping using a utility function such as arrayfun() )
ayesha jabeen
2016-8-24
编辑:Walter Roberson
2016-8-24
Given below is what Tried to get the starting and ending limits:
row=row';
row=num2str(row);
class(row)
begin_positions = strfind(row, '0');
end_positions = strfind(row,'1');
begin_positions(1,1:10)
end_positions(1,1:10)
and answer is
ans =
73 153 233 313 393 473 553 633 713 792
ans =
1 72 80 81 88 96 104 112 120 128
>> ScreenShot:
data:image/s3,"s3://crabby-images/853cb/853cb1fe5e7e1227b37389afa693748dca5a9468" alt=""
Walter Roberson
2016-8-24
Do not use num2str() and strfind for characters, just use strfind directly. Using strfind on numeric values is an undocumented use.
numrows = size(YourData, 1);
for K = 1 : numrows
thisrow = YourData(K, :);
vad = calculate_vad(this_row);
begin_positions = strfind([1 vad], [1 0]) - 1;
end_positions = strfind([vad 1], [0 1]);
newrow = embed_something(thisrow, begin_positions, end_positions);
newdata(K, :) = newrow;
end
Walter Roberson
2016-8-27
vad = reshape( calculate_vad(this_row), 1, []);
You told me you were working on rows so I assumed that vad would be a row .
ayesha jabeen
2016-8-28
Great sir This trick is going very well.... i get all the starting and ending positions from audio now i have to embed text in these segments.....but for embeding which algorithm is more suitable i cannot decide it.help me in selection of algorithm and provide me a trick so that after embeding data in silence intervals noise is not generated from silence segments because we know that all silence bits are 0 and if we embed data in form of 1's then noise is created.
Walter Roberson
2016-8-28
Does your detection code look only for exact 0's, or does it look for regions of low intensity? If it looks for exact 0's then you are typically not going to find very many of them because real-world applications are going to have background noise and air noise and breath pops and the like. If you define "noise" as anything non-zero then any algorithm you use that changes the 0's to non-zero is going to introduce "noise" according to your definition.
It would be more typical for these kinds of applications to be looking at embedding information so that when played back, the change would not be perceptible to a human. Which is very different than concentrating on exact 0's. For example, you can hear a mosquito buzzing from about 3 metres away in a quiet room (0 dB) but if you were to put the same mosquito buzz into the opening movement of Beethoven's Fifth, then humans would be unlikely to notice.
You need to define your objectives more precisely. Are you interested in human perception, or are you trying to avoid being detected by machine analysis?
ayesha jabeen
2016-8-28
i want to embed in silence and all silence bits are zeroes such frequencies that human does not perceptible can be used for embedding and i want to add data such a way that it remains un audible and snr almost remains for both signal means orignal and stego the same. although it might change a bit thats will be ok.
Walter Roberson
2016-8-29
When you are trying to encode changes to be unperceptible, and if you are encoding strictly in periods of silence, then your changes need to be either at a higher frequency than humans perceive, or at a lower frequency than humans can perceive. Low frequency changes require rewriting the entire audio stream, not just isolated groups of bits, so you need to concentrate on high frequencies.
The upper limit on human perception depends on the individual and declines with age, with fairly few surpassing 20 kHz sensitivity, but healthy teenagers might reach 18 kHz. I see 12 kHz mentioned as being a pretty typical upper bound for adults. Reproducing 12 kHz requires a 24 kHz sampling rate, a little above the common 22100 Hz audio sampling rate, but below the 44200 Hz sampling rate used for good quality CDs. If your audio sampling rate is 8000 Hz or 9600 Hz then any repeated change you can make runs the risk of being perceived. Roughly from about 20000 Hz sampling rate and up you could probably "get away" with changing single isolated bits, and at some point (I don't know exactly, maybe around 35000 Hz) you could probably "get away" with changing two bits in a row.
On the other hand, if you have very short trains of bit changes within a longer fragment, then in theory that corresponds to high frequency changes (think of the fourier transform of a square wave), so in longer samples you might be able to pack changes closer together than is implied by the straight short-time analysis. You would need to research how long a human needs a sound to be present to perceive the sound.
The study of how humans perceive sounds in context leads to a field called "perceptual coding". You do not need silence to encode data in, provided that you code into fragments that humans will not notice. Human ears are not perfect frequency analysis tools, detecting each frequency in isolation. There is inertia, vibrations take time to dampen away, and so on. When you work at the level of encoding strictly into silence then you are restricting yourself to standard Nyquist analysis of what would be perceptible in isolation; if you start packing closer together because "people don't seem to notice" then you are getting into perceptual matters. If you are changing more than one bit in every 24000 or so samples then you are dealing with perceptual differences.
One further aspect you need to consider is that when you write data to an audio file, the audio file encoding might use lossy compression, perhaps based upon perceptual encoding. This would definitely be the case for writing mp3 files for example. Your careful tweaks to the bitstream could get undone if you write to a lossy audio file format.
ayesha jabeen
2016-9-18
I have a code for hiding data by using LSB. Now i want to change this code in LSB of silence intervals... Silence intervals are find out and its positions are also known.... This is the last step of my project but i am not performed it sucessfully can you help me???
ayesha jabeen
2016-9-18
编辑:Walter Roberson
2016-9-18
This is the code for LSB hiding
[y,fs,nbits,opts]=wavread([handles.pname handles.fname],[1 2]);
%open a wav file for hidding text
fid1=fopen([handles.pname handles.fname],'r');
%first 40 bytes make wav header,store the header
header=fread(fid1,40,'uint8=>char');
%41st byte to 43rd byte,length of wav data samples
data_size=fread(fid1,1,'uint32');
%copy the 16 bit wav data samples starting from 44th byte
[dta,count]=fread(fid1,inf,'uint16');
%close the file, only wav data samples are sufficient to hide the text
fclose(fid1);
lsb=1;
msg=get(handles.edit1,'string'); %get text message from editbox
[ro,co]=size(msg);
if ( (ro*co*8+28) > count )
msgbox('Message too big, select small message','Empty');
else
[m_msg,n_msg]=size(msg);
msg_double=double(msg); %convert it to double
msg_bin=de2bi(msg_double,8); %then convert message to binary
[m,n]=size(msg_bin); %size of message binary
msg_bin_re=reshape(msg_bin,m*n,1); %reshape the message binary in a column vector
m_bin=de2bi(m_msg,10)'; %
n_bin=de2bi(n_msg,10)'; %
len=length(msg_bin_re); %length of message binary
len_bin=de2bi(len,20)'; %convert the length to binary
%hide identity in first 8 wav data samples.
identity=[1 0 1 0 1 0 1 0]';
dta(1:8)=bitset(dta(1:8),lsb,identity(1:8));
%hide binary length of message from 9th to 28 th sample
dta(9:18)=bitset(dta(9:18),lsb,m_bin(1:10));
dta(19:28)=bitset(dta(19:28),lsb,n_bin(1:10));
%hide the message binary starting from 29th position of wave data samples
dta(29:28+len)=bitset(dta(29:28+len),lsb,msg_bin(1:len)');
randname=num2str(randint(1,1,[1 2000]));
%open a new wav file in write mode
fid2=fopen(['new' randname '.wav'],'w');
%copy the header of original wave file
fwrite(fid2,header,'uint8');
fwrite(fid2,data_size,'uint32');
%copy the wav data samples with hidden text in new file
fwrite(fid2,dta,'uint16');
fclose(fid2);
msgbox(['Your text is hidden in new' randname '.wav file'],'');
And the silence intervals i have find out are have written below
Walter Roberson
2016-9-18
What is the smallest value of fs (sampling frequency) that you are required to support in your code?
ayesha jabeen
2016-9-18
These are the starting and ending positions values
>> begin_positions
begin_positions =
0 30070 71366 158111 193288 198039 224568 234872 260209
>> end_positions
end_positions =
3565 36984 79298 164916 194271 200413 230174 237543 270629
Walter Roberson
2016-9-18
You cannot accomplish that task by modifying only silence when the sampling frequency is 4410 Hz (I've never seen that one! 4000 yes, but never 4410!) or 16000 Hz.
"The commonly stated range of human hearing is 20 Hz to 20 kHz. Under ideal laboratory conditions, humans can hear sound as low as 12 Hz and as high as 28 kHz, though the threshold increases sharply at 15 kHz in adults, corresponding to the last auditory channel of the cochlea."
Since your target is to be imperceptible in the silence, you should not be creating any frequency difference less than 15 kHz, which requires a 30000 Hz sampling frequency. Above that could perhaps be forgiven given the sharp drop off mentioned above. Each gap of silence would need to be considered separately, and no change can be considered for the gap unless you are changing less than 1 sample in 30000 at that 30000 Hz -- so you cannot make any change unless the gap is at least 30000 samples long (you would need to calculate the correct minimum gap size for higher sampling frequencies.)
Looking at your start and end positions, it is obvious that none of your gaps are even remotely close to 30000 samples: your largest is 10420 samples. A change to even one sample in that 10420 would induce a frequency change in the 5 kHz range -- a range that humans are particularly sensitive to (2 kHz to 5 kHz).
Your task is doomed to failure unless the sound file has longer silences -- or unless you give up on relying on Nyquist theory and instead switch to perceptual coding theory, modifications to the sound that humans will not notice because of the way the auditory cortex processes sounds in context.
Your entire method of reading the samples is dubious. You are assuming that the samples are stored as direct 16 bit unsigned integer values. That will not generally be the case. WAV files can have various forms of compression, and if the sampling depth is less than 16 bits, multiple samples might be present per 16 bit word.
ayesha jabeen
2016-9-18
Could you please send me the complete code for data embedding in silences intervals.... I am facing much difficulties and no one here to guide me...
Walter Roberson
2016-9-18
Just do it. It's just array indexing!
Your lines
%hide the message binary starting from 29th position of wave data samples
dta(29:28+len)=bitset(dta(29:28+len),lsb,msg_bin(1:len)');
are what has to be changed.
I will give you a hint:
idx = 29:28+len;
dta(idx) = bitset(dta(idx), lsb, msg_bin(1:len)');
Except you want to change the idx locations to reflect the locations where there are silence.
This will do "data embedding in silences intervals", but will be completely unable to meet your requirement "such frequencies that human does not perceptible can be used for embedding and i want to add data such a way that it remains un audible"
Walter Roberson
2016-9-18
I looked at the size of your intervals of silence. They varied, so I took the first one as representative; it was 3565 samples long. So I decided to experiment with 3500 samples, each with the bottom bit possibly changed:
def = randi([0 1],3500,1,'uint16');
DEF = double(typecast(def,'int16'))/32768.;
At the normal maximum volume of my laptop (which is not all that loud),
sound(DEF, 16000)
produced no audible sound.
sound(DEF*20, 16000)
was just perceptible, and
sound(DEF*50, 16000)
was clearly perceptible.
But in a context like this, multiplying by a constant is just volume control: for me to say that sound(DEF*50, 16000) was clearly perceptible in my configuration but sound(DEF, 16000) was not, is not a matter of frequency range, just a matter of how loud I had it turned up. The frequencies were audible to humans (it comes out as a burst of noise similar to white noise.) If the source material was, for example, a quiet meadow with bird calls and insects, and the person had the volume turned up, then the burst of induced noise from the hidden bits would be obvious. With lower volume (for example, the source was some talking or some heavy metal music and the person did not want to damage their ears) then the noise might not be perceived -- but that is not because it would be inherently imperceptible, it would just be low volume.
If you want the modifications to be imperceptible because they are beyond the human hearing range, then you need to decide on a cutoff frequency, Fc, such as the 15000 mentioned above, and then you cannot modify more than 1 bit in every Fc*2 samples (for 44100 Hz that would be roughly 2/3 of a second; for 16000 Hz that would be roughly 2 seconds.)
As you do not appear to have silence gaps anywhere near that long, if you want to encode data into the silence intervals, you are going to need to change your audibility goals.
One thing I am not clear on is how the decoding mechanism is going to know where to look for the hidden bits. Any bits you modify are no longer going to be 0, so unless you encode their positions in a known location, you are not going to be able to tell they were originally silence. Not, that is, unless you redefine silence to be samples for which all except possibly the LSB are 0, as that property does not change after embedding data in the LSB.
ayesha jabeen
2016-9-19
I can not do it i am totally new in signal processing field just one thing i can done i find out all the indexes where silence and than by using these indexes hide data in LSBs....
Walter Roberson
2016-9-19
idx_cell = arrayfun(@(K) begin_positions(K):end_positions(K), 1:length(begin_positions), 'Uniform', 0);
idx = horzcat(idx_cell{:});
if len < length(idx)
error('Message too long for the file, file only has %d silence locations available, need %d for the message', length(idx), len);
end
dta(idx(1:len)) = bitset(dta(idx(1:len)), lsb, msg_bin(1:len)');
Walter Roberson
2016-9-20
In the arrayfun that I showed, K is a "dummy" parameter to the anonymous function -- the name to refer to the input in the body of the code. You can replace
idx_cell = arrayfun(@(K) begin_positions(K):end_positions(K), 1:length(begin_positions), 'Uniform', 0);
with
idx_cell = cell(1, length(begin_positions));
for K = 1 : length(begin_positions)
idx_cell{K} = begin_positions(K) : end_positions(K);
end
Walter Roberson
2016-9-20
You should replace
[row,col,v]=find(vad(:)==0);
indices = [row, col];
with
indices = find(vad(:)==0);
Walter Roberson
2016-9-20
I do not know if it is correct, as you have not given specifications for exactly what the program is to do.
ayesha jabeen
2016-9-20
编辑:Walter Roberson
2017-10-4
Program only try to hide data in silence intervals
ayesha jabeen
2016-9-20
Sir How to reconstruct video after splitting it in audio and video frames???
Walter Roberson
2016-9-20
That is a different topic and should have its own Question. In any case, you need the Computer Vision VideoFileWriter
ayesha jabeen
2016-10-3
Sir i have hide data by using LSB method.Data covered successfully in audio but when video is reconstructed by using this audio and after that extract data there is no hidden data in file.I can not understand why this problem is happening?What could that compress the file and due to this compression data is exiting from audio?tell me how do I solve this problem?
Walter Roberson
2017-7-10
How are you saving the audio data into the video ? How are you constructing the video object when you do that?
更多回答(0 个)
另请参阅
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!发生错误
由于页面发生更改,无法完成操作。请重新加载页面以查看其更新后的状态。
您也可以从以下列表中选择网站:
如何获得最佳网站性能
选择中国网站(中文或英文)以获得最佳网站性能。其他 MathWorks 国家/地区网站并未针对您所在位置的访问进行优化。
美洲
- América Latina (Español)
- Canada (English)
- United States (English)
欧洲
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom(English)
亚太
- Australia (English)
- India (English)
- New Zealand (English)
- 中国
- 日本Japanese (日本語)
- 한국Korean (한국어)