Unable to perform assignment because the size of the left side is 100-by-198 and the size of the right side is 100-by-98. Error in backgroundSpectrograms (line 50) Xbkg(:,:,:,ind) = filterBank * spec;
    7 次查看(过去 30 天)
  
       显示 更早的评论
    
I try to do the background spectograms its the same records as in https://www.mathworks.com/help/deeplearning/examples/deep-learning-speech-recognition.html
and it gives me that error :
 Warning: 
The FFT length is too small to compute the specified number of
bands. Decrease the number of bands or increase the FFT length. 
> In designAuditoryFilterBank (line 104)
  In backgroundSpectrograms (line 20)
nable to perform assignment because the size of the left side is
100-by-198 and the size of the right side is 100-by-98.
Error in backgroundSpectrograms (line 50)
        Xbkg(:,:,:,ind) = filterBank * spec;
I dont know how to fix it its the backgrounds its the same in example so I dont know what is the error about. 
Help me to fix it :
ads = 1x1 audioDatastore
numBkgClips = 4000 
volumeRange = [1e-4,1]
segmentDuration= 2
hopDuration = 0.010
numBands = 100
frameDuration = 0.025
FFT length = 512 for backgroundSpectograms
help me with the values
if I set FFT length to 1000 the warning out but the error stay
I must give the hopDuration, numBands,frameDuration, segmentDuration values like this because of my own wav files .
When I try do 
adsBkg = subset(ads0,ads0.Labels=="_background_noise_");
numBkgClips = 4000;
volumeRange = [1e-4,1];
XBkg = backgroundSpectrograms(adsBkg,numBkgClips,volumeRange,segmentDuration,frameDuration,hopDuration,numBands);
XBkg = log10(XBkg + epsil);
it gives me above error.
backgroundSpectogram.m
% backgroundSpectrograms(ads,numBkgClips,volumeRange,segmentDuration,frameDuration,hopDuration,numBands)
% calculates numBkgClips spectrograms of background clips taken from the
% audio files in the |ads| datastore. Approximately the same number of
% clips is taken from each audio file. Before calculating spectrograms, the
% function rescales each audio clip with a factor sampled from a
% log-uniform distribution in the range given by volumeRange.
% segmentDuration is the total duration of the speech clips (in seconds),
% frameDuration the duration of each spectrogram frame, hopDuration the
% time shift between each spectrogram frame, and numBands the number of
% frequency bands.
function Xbkg = backgroundSpectrograms(ads,numBkgClips,volumeRange,segmentDuration,frameDuration,hopDuration,numBands)
disp("Computing background spectrograms...");
fs        = 16e3;
FFTLength = 512;
persistent filterBank
if isempty(filterBank)
    filterBank = designAuditoryFilterBank(fs,'FrequencyScale','bark',...
        'FFTLength',FFTLength,...
        'NumBands',numBands,...
        'FrequencyRange',[50,7000]);
end
logVolumeRange = log10(volumeRange);
numBkgFiles = numel(ads.Files);
numClipsPerFile = histcounts(1:numBkgClips,linspace(1,numBkgClips,numBkgFiles+1));
numHops = segmentDuration/hopDuration - 2;
Xbkg = zeros(numBands,numHops,1,numBkgClips,'single');
ind = 1;
for count = 1:numBkgFiles
    wave = read(ads);
    frameLength = frameDuration*fs;
    hopLength = hopDuration*fs;
    for j = 1:numClipsPerFile(count)
        indStart =  randi(numel(wave)-fs);
        logVolume = logVolumeRange(1) + diff(logVolumeRange)*rand;
        volume = 10^logVolume;
        x = wave(indStart:indStart+fs-1)*volume;
        x = max(min(x,1),-1);
        [~,~,~,spec] = spectrogram(x,hann(frameLength,'periodic'),frameLength - hopLength,FFTLength,'onesided');
        Xbkg(:,:,:,ind) = filterBank * spec;
        if mod(ind,1000)==0
            disp("Processed " + string(ind) + " background clips out of " + string(numBkgClips))
        end
        ind = ind + 1;
    end
end
disp("...done");
end
2 个评论
  imtiaz waheed
 2020-2-6
				numBkgClips = 4000;
volumeRange = [1e-4,1];
segmentDuration= 2;
hopDuration = 0.010;
numBands = 100;
frameDuration = 0.025;
FFTlength = 1024;
adsBkg = subset(ads,ads.Labels=='_background_noise_');
% ads is your datastore
XBkg = backgroundSpectrograms(adsBkg,numBkgClips);volumeRange;segmentDuration;frameDuration;hopDuration;numBands;FFTlength;
disp('Computing background spectrograms...');
logVolumeRange = log10(volumeRange);
numBkgFiles = numel(ads.Files);
numClipsPerFile = histcounts(1:numBkgClips,linspace(1,numBkgClips,numBkgFiles+1));
numHops = segmentDuration/hopDuration - 2;
Xbkg = zeros(numBands,numHops,1,numBkgClips,'single');
ind = 1;
for count = 1:numBkgFiles
    [wave,info] = read(ads);
    fs = info.SampleRate;
    frameLength = frameDuration*fs;
    hopLength = hopDuration*fs;
    for j = 1:numClipsPerFile(count)
        indStart =  randi(numel(wave)-fs);
        logVolume = logVolumeRange(1) + diff(logVolumeRange)*rand;
        volume = 10^logVolume;
        x = wave(indStart:indStart+fs-1)*volume;
        x = max(min(x,1),-1);
        Xbkg(:,:,:,ind) = melSpectrogram(x,fs, ...
            'WindowLength',frameLength, ...
            'OverlapLength',frameLength - hopLength, ...
            'FFTLength',512, ...
            'NumBands',numBands, ...
            'FrequencyRange',[50,7000]);
        if mod(ind,1000)==0
            disp('Processed ' + string(ind) + ' background clips out of ' + string(numBkgClips))
        end
        ind = ind + 1;
    end
end
disp('...done');
回答(2 个)
  jibrahim
    
 2020-1-7
        Hi Barb,
There are two problems:
1) Since you asked for 100 bands in the auditory filter ban, the hard-coded FFT length (512) is too small. 1024 should work.
2) the code hard-codes the expected segment duration to 1 second (by using fs here: x = wave(indStart:indStart+fs-1)*volume;)
I modified and attached the code. This should run now:
numBkgClips = 4000;
volumeRange = [1e-4,1];
segmentDuration= 2;
hopDuration = 0.010;
numBands = 100;
frameDuration = 0.025;
FFTlength = 1024;
adsBkg = subset(ads,ads.Labels=="_background_noise_");
% ads is your datastore
XBkg = backgroundSpectrograms(adsBkg,numBkgClips,volumeRange,segmentDuration,frameDuration,hopDuration,numBands,FFTlength);
5 个评论
  jibrahim
    
 2020-1-16
				Make sure that the argument to the fullyConnectedLayer that precedes the softMaxLayer is equal to the number of classes you are trying to classify. It seems like you have 4 classes, but you using fullyConnectedLayer(3). If you indeed have 3 classes, then maybe the categorical validation array you are supplying has an unused cateogry. You can remove it using removecats:
YValidation = removecats(YValidation);
  N/A
 2020-1-22
        1 个评论
  jibrahim
    
 2020-1-23
				Make sure the size of the image going into your network matches the image size you used in training:
 [YPredicted,probs] = classify(trainedNet,spec,'ExecutionEnvironment','cpu');
It looks like the size of spec is not [100 98 1].
I remember you were generating spectrograms based on 2-second segments. Make sure waveBuffer holds indeed 2 seconds. I think the originsl demo uses one second, so you might have to slightly change those three lines of code:
  x = audioIn();
    waveBuffer(1:end-numel(x)) = waveBuffer(numel(x)+1:end);
    waveBuffer(end-numel(x)+1:end) = x;
另请参阅
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!


