Keyword Spotting in Noise Code Generation with Intel MKL-DNN
This example demonstrates code generation for keyword spotting using a Bidirectional Long Short-Term Memory (BiLSTM) network and mel frequency cepstral coefficient (MFCC) feature extraction. MATLAB® Coder™ with Deep Learning Support enables the generation of a standalone executable (.exe
) file. Communication between the MATLAB® (.mlx) file and the generated executable file occurs over asynchronous User Datagram Protocol (UDP). The incoming speech signal is displayed using a timescope
. A mask is shown as a blue rectangle surrounding spotted instances of the keyword, YES. For more details on MFCC feature extraction and deep learning network training, visit Keyword Spotting in Noise Using MFCC and LSTM Networks.
Example Requirements
MATLAB® Coder Interface for Deep Learning Support Package
Intel® Xeon® processor with support for Intel Advanced Vector Extensions 2 (Intel AVX2)
Intel Math Kernel Library for Deep Neural Networks (MKL-DNN)
Environment variables for Intel MKL-DNN
For supported versions of libraries and for information about setting up environment variables, see Prerequisites for Deep Learning with MATLAB Coder (MATLAB Coder).
Pretrained Network Keyword Spotting Using MATLAB and Streaming Audio from Microphone
The sample rate of the pretrained network is 16
kHz. Set the window length to 512
samples, with an overlap length of 384
samples, and a hop length defined as the difference between the window and overlap lengths. Define the rate at which the mask is estimated. A mask is generated once for every numHopsPerUpdate
audio frames.
fs = 16e3; windowLength = 512; overlapLength = 384; hopLength = windowLength - overlapLength; numHopsPerUpdate = 16; maskLength = hopLength*numHopsPerUpdate;
Create an audioFeatureExtractor
object to perform MFCC feature extraction.
afe = audioFeatureExtractor('SampleRate',fs, ... 'Window',hann(windowLength,'periodic'), ... 'OverlapLength',overlapLength, ... 'mfcc',true, ... 'mfccDelta',true, ... 'mfccDeltaDelta',true);
Download and load the pretrained network, as well as the mean (M
) and the standard deviation (S
) vectors used for Feature Standardization.
downloadFolder = matlab.internal.examples.downloadSupportFile("audio/examples","kwslstm.zip"); dataFolder = './'; netFolder = fullfile(dataFolder,"KeywordSpotting"); unzip(downloadFolder,netFolder) load(fullfile(netFolder,'KWSNet.mat'),"KWSNet","M","S");
Call generateMATLABFunction
on the audioFeatureExtractor
object to create the feature extraction function. You will use this function in the processing loop.
generateMATLABFunction(afe,'generateKeywordFeatures','IsStreaming',true);
Define an Audio Device Reader that can read audio from your microphone. Set the frame length equal to the hop length. This enables you to compute a new set of features for every new audio frame from the microphone.
frameLength = hopLength; adr = audioDeviceReader('SampleRate',fs, ... 'SamplesPerFrame',frameLength);
Create a Time Scope to visualize the speech signals and estimated mask.
scope = timescope('SampleRate',fs, ... 'TimeSpanSource','property', ... 'TimeSpan',5, ... 'TimeSpanOverrunAction','Scroll', ... 'BufferLength',fs*5*2, ... 'ShowLegend',true, ... 'ChannelNames',{'Speech','Keyword Mask'}, ... 'YLimits',[-1.2 1.2], ... 'Title','Keyword Spotting');
Initialize a buffer for the audio data, a buffer for the computed features, and a buffer to plot the input audio and the output speech mask.
dataBuff = dsp.AsyncBuffer(windowLength); featureBuff = dsp.AsyncBuffer(numHopsPerUpdate); plotBuff = dsp.AsyncBuffer(numHopsPerUpdate*windowLength);
Perform keyword spotting on speech received from your microphone. To run the loop indefinitely, set timeLimit
to Inf
. To stop the simulation, close the scope
.
timeLimit = 20; show(scope); tic while toc < timeLimit && isVisible(scope) data = adr(); write(dataBuff,data); write(plotBuff,data); frame = read(dataBuff,windowLength,overlapLength); features = generateKeywordFeatures(frame,fs); write(featureBuff,features.'); if featureBuff.NumUnreadSamples == numHopsPerUpdate featureMatrix = read(featureBuff); featureMatrix(~isfinite(featureMatrix)) = 0; featureMatrix = (featureMatrix - M)./S; [scores, state] = predict(KWSNet,featureMatrix); KWSNet.State = state; [~,v] = max(scores,[],2); v = double(v) - 1; v = mode(v); predictedMask = repmat(v,numHopsPerUpdate*hopLength,1); data = read(plotBuff); scope([data,predictedMask]); drawnow limitrate; end end release(adr) hide(scope)
The helperKeywordSpotting
supporting function encapsulates capturing the audio, feature extraction and network prediction process demonstrated previously. To make feature extraction compatible with code generation, feature extraction is handled by the generated generateKeywordFeatures
function. To make the network compatible with code generation, the supporting function uses the coder.loadDeepLearningNetwork
(MATLAB Coder) (MATLAB Coder) function to load the network.
The supporting function uses a dsp.UDPSender
System object to send the input data along with the output mask predicted by the network to MATLAB. The MATLAB script uses the dsp.UDPReceiver
System object to receive the input data along with the output mask predicted by the network running in the supporting function.
Generate Executable on Desktop
Create a code generation configuration object to generate an executable. Specify the target language as C++.
cfg = coder.config('exe'); cfg.TargetLang = 'C++';
Create a configuration object for deep learning code generation with the MKL-DNN library. Attach the deep learning configuration object to the code generation configuration object.
dlcfg = coder.DeepLearningConfig('mkldnn');
cfg.DeepLearningConfig = dlcfg;
Generate the C++ main file required to produce the standalone executable.
cfg.GenerateExampleMain = 'GenerateCodeAndCompile';
Generate helperKeywordSpotting
, a supporting function that encapsulates the audio capture, feature extraction, and network prediction processes. You get a warning in the code generation logs that you can disregard because helperKeywordSpotting
has an infinite loop that continuously looks for an audio frame from MATLAB.
codegen helperKeywordSpotting -config cfg -report
Warning: Function 'helperKeywordSpotting' does not terminate because of an infinite loop. Warning in ==> helperKeywordSpotting Line: 70 Column: 1 Code generation successful (with warnings): View report
Prepare Dependencies and Run the Generated Executable
In this section, you generate all the required dependency files and put them into a single folder. During the build process, MATLAB Coder generates buildInfo.mat
, a file that contains the compilation and run-time dependency information for the standalone executable.
Set the project name to helperKeywordSpotting
.
projName = 'helperKeywordSpotting'; packageName = [projName,'Package']; if ispc exeName = [projName,'.exe']; else exeName = projName; end
Load buildinfo.mat
and use packNGo
(MATLAB Coder) to produce a .zip
package.
load(['codegen',filesep,'exe',filesep,projName,filesep,'buildInfo.mat']); packNGo(buildInfo,'fileName',[packageName,'.zip'],'minimalHeaders',false);
Unzip the package and place the executable file in the unzipped directory.
unzip([packageName,'.zip'],packageName); copyfile(exeName, packageName,'f');
To invoke a standalone executable that depends on the MKL-DNN Dynamic Link Library, append the path to the MKL-DNN library location to the environment variable PATH
.
setenv('PATH',[getenv('INTEL_MKLDNN'),filesep,'lib',pathsep,getenv('PATH')]);
Run the generated executable.
if ispc system(['start cmd /k "title ',packageName,' && cd ',packageName,' && ',exeName]); else cd(packageName); system(['./',exeName,' &']); cd ..; end
Perform Keyword Spotting Using Deployed Code
Create a dsp.UDPReceiver
System object to receive speech data and the predicted speech mask from the standalone executable. Each UDP packet received from the executable consists of maskLength
mask samples and speech samples. The maximum message length for the dsp.UDPReceiver
object is 65507
bytes. Calculate the buffer size to accommodate the maximum number of UDP packets.
sizeOfFloatInBytes = 4; speechDataLength = maskLength; numElementsPerUDPPacket = maskLength + speechDataLength; maxUDPMessageLength = floor(65507/sizeOfFloatInBytes); samplesPerPacket = 1 + numElementsPerUDPPacket; numPackets = floor(maxUDPMessageLength/samplesPerPacket); bufferSize = numPackets*samplesPerPacket*sizeOfFloatInBytes; UDPReceive = dsp.UDPReceiver('LocalIPPort',20000, ... 'MessageDataType','single', ... 'MaximumMessageLength',samplesPerPacket, ... 'ReceiveBufferSize',bufferSize);
To run the keyword spotting indefinitely, set timelimit
to Inf
. To stop the simulation, close the scope
.
tic; timelimit = 20; show(scope); while toc < timelimit && isVisible(scope) data = UDPReceive(); if ~isempty(data) plotMask = data(1:maskLength); plotAudio = data(maskLength+1 : maskLength+speechDataLength); scope([plotAudio,plotMask]); end drawnow limitrate; end hide(scope);
Release the system objects and terminate the standalone executable.
release(UDPReceive); release(scope); if ispc system(['taskkill /F /FI "WindowTitle eq ',projName,'* " /T']); else system(['killall ',exeName]); end
SUCCESS: The process with PID 17424 (child process of PID 1568) has been terminated.
Evaluate Execution Time Using Alternative MEX Function Workflow
A similar workflow involves using a MEX file instead of the standalone executable. Perform MEX profiling to measure the computation time for the workflow.
Create a code generation configuration object to generate the MEX function. Specify the target language as C++.
cfg = coder.config('mex'); cfg.TargetLang = 'C++';
Create a configuration object for deep learning code generation with the MKL-DNN library. Attach the deep learning configuration object to the code generation configuration object.
dlcfg = coder.DeepLearningConfig('mkldnn');
cfg.DeepLearningConfig = dlcfg;
Call codegen to generate the MEX function for profileKeywordSpotting
.
inputAudioFrame = ones(hopLength,1,'single'); codegen profileKeywordSpotting -config cfg -args {inputAudioFrame} -report
Code generation successful: View report
Measure the execution time of the MATLAB code.
x = pinknoise(hopLength,1,'single'); numPredictCalls = 100; totalNumCalls = numPredictCalls*numHopsPerUpdate; exeTimeStart = tic; for call = 1:totalNumCalls [outputMask,inputData,plotFlag] = profileKeywordSpotting(x); end exeTime = toc(exeTimeStart); fprintf('MATLAB execution time per %d ms of audio = %0.4f ms\n',int32(1000*numHopsPerUpdate*hopLength/fs),(exeTime/numPredictCalls)*1000);
MATLAB execution time per 128 ms of audio = 12.7889 ms
Measure the execution time of the MEX function.
exeTimeMexStart = tic; for call = 1:totalNumCalls [outputMask,inputData,plotFlag] = profileKeywordSpotting_mex(x); end exeTimeMex = toc(exeTimeMexStart); fprintf('MEX execution time per %d ms of audio = %0.4f ms\n',int32(1000*numHopsPerUpdate*hopLength/fs),(exeTimeMex/numPredictCalls)*1000);
MEX execution time per 128 ms of audio = 4.1605 ms
Compare total execution time of the standalone executable approach with the MEX function approach. This performance test is done on a machine using an NVIDIA Titan Xp® (compute capability 6.1) GPU with 12.8 GB memory and an Intel Xeon W-2133 CPU running at 3.60 GHz.
PerformanceGain = exeTime/exeTimeMex
PerformanceGain = 3.0739