openl3Preprocess
Description
specifies options using one or more features
= openl3Preprocess(audioIn
,fs
,Name,Value
)Name,Value
arguments. For
example, features = openl3Preprocess(audioIn,fs,'OverlapPercentage',75)
applies a 75% overlap between consecutive frames used to generate the spectrograms.
Examples
Download OpenL3 Network
Download and unzip the Audio Toolbox™ model for OpenL3.
Type openl3
at the Command Window. If the Audio Toolbox model for OpenL3 is not installed, the function provides a link to the location of the network weights. To download the model, click the link. Unzip the file to a location on the MATLAB path.
Alternatively, execute these commands to download and unzip the OpenL3 model to your temporary directory.
downloadFolder = fullfile(tempdir,'OpenL3Download'); loc = websave(downloadFolder,'https://ssd.mathworks.com/supportfiles/audio/openl3.zip'); OpenL3Location = tempdir; unzip(loc,OpenL3Location) addpath(fullfile(OpenL3Location,'openl3'))
Check that the installation is successful by typing openl3
at the Command Window. If the network is installed, then the function returns a DAGNetwork
(Deep Learning Toolbox) object.
openl3
ans = DAGNetwork with properties: Layers: [30×1 nnet.cnn.layer.Layer] Connections: [29×2 table] InputNames: {'in'} OutputNames: {'out'}
Extract OpenL3 Embeddings from Audio Signal
Use openl3Preprocess
to extract embeddings from an audio signal.
Read in an audio signal.
[audioIn,fs] = audioread('Counting-16-44p1-mono-15secs.wav');
To extract spectrograms from the audio, call the openl3Preprocess
function with the audio and sample rate. Use 50% overlap and set the spectrum type to linear. The openl3Preprocess
function returns an array of 30 spectrograms produced using an FFT length of 512.
features = openl3Preprocess(audioIn,fs,'OverlapPercentage',50,'SpectrumType','linear'); [posFFTbinsOvLap50,numHopsOvLap50,~,numSpectOvLap50] = size(features)
posFFTbinsOvLap50 = 257
numHopsOvLap50 = 197
numSpectOvLap50 = 30
Call openl3Preprocess
again, this time using the default overlap of 90%. The openl3Preprocess
function now returns an array of 146 spectrograms.
features = openl3Preprocess(audioIn,fs,'SpectrumType','linear'); [posFFTbinsOvLap90,numHopsOvLap90,~,numSpectOvLap90] = size(features)
posFFTbinsOvLap90 = 257
numHopsOvLap90 = 197
numSpectOvLap90 = 146
Visualize one of the spectrograms at random.
randSpect = randi(numSpectOvLap90); viewRandSpect = features(:,:,:,randSpect); N = size(viewRandSpect,2); binsToHz = (0:N-1)*fs/N; nyquistBin = round(N/2); semilogx(binsToHz(1:nyquistBin),mag2db(abs(viewRandSpect(1:nyquistBin)))) xlabel('Frequency (Hz)') ylabel('Power (dB)'); title([num2str(randSpect),'th Spectrogram']) axis tight grid on
Create an OpenL3 network (this requires Deep Learning Toolbox) using the same 'SpectrumType'
.
net = openl3('SpectrumType','linear');
Extract and visualize the audio embeddings.
embeddings = predict(net,features); surf(embeddings,'EdgeColor','none') view([90,-90]) axis([1 numSpectOvLap90 1 numSpectOvLap90]) xlabel('Embedding Length') ylabel('Spectrum Number') title('OpenL3 Feature Embeddings') axis tight
Input Arguments
audioIn
— Input signal
column vector | matrix
Input signal, specified as a column vector or matrix. If you specify a matrix,
openl3Preprocess
treats the columns of the matrix as individual
audio channels.
Data Types: single
| double
fs
— Sample rate (Hz)
positive scalar
Sample rate of the input signal in Hz, specified as a positive scalar.
Data Types: single
| double
Name-Value Arguments
Specify optional pairs of arguments as
Name1=Value1,...,NameN=ValueN
, where Name
is
the argument name and Value
is the corresponding value.
Name-value arguments must appear after other arguments, but the order of the
pairs does not matter.
Before R2021a, use commas to separate each name and value, and enclose
Name
in quotes.
Example: openl3Preprocess(audioIn,fs,'SpectrumType','mel256')
OverlapPercentage
— Percentage overlap between consecutive spectrograms
90
(default) | scalar in the range [0,100)
Percentage overlap between consecutive spectrograms, specified as a scalar in the range [0,100).
Data Types: single
| double
SpectrumType
— Spectrum type
'mel128'
(default) | 'mel256'
| 'linear'
Spectrum type generated from audio and used as input to the neural network, specified as one of these:
'mel128'
–– Generates mel spectrograms using 128 mel bands.'mel256'
–– Generates mel spectrograms using 256 mel bands.'linear'
–– Generates positive one-sided spectrograms using an FFT length of 512.
Data Types: char
| string
Output Arguments
features
— Spectrograms that can be fed to OpenL3 pretrained network
N-by-M-by-1-by-K
array
Spectrograms generated from audioIn
, returned as an
N-by-M-by-1-by-K
array.
When you specify 'SpectrumType'
as one of these:
'mel128'
–– The dimensions are128
-by-199
-by-1
-by-K, where128
is the number of mel bands and199
is the number of time hops.'mel256'
–– The dimensions are256
-by-199
-by-1
-by-K, where256
is the number of mel bands and199
is the number of time hops.'linear'
–– The dimensions are257
-by-197
-by-1
-by-K, where257
is the positive one-sided FFT length and197
is the number of time hops.
K represents the number of spectrograms and depends on the length of
audioIn
, the number of channels inaudioIn
, as well asOverlapPercentage
.
Data Types: single
References
[1] Cramer, Jason, et al. "Look, Listen, and Learn More: Design Choices for Deep Audio Embeddings." In ICASSP 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, 2019, pp. 3852-56. DOI.org (Crossref), doi:/10.1109/ICASSP.2019.8682475.
Extended Capabilities
C/C++ Code Generation
Generate C and C++ code using MATLAB® Coder™.
GPU Arrays
Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.
This function fully supports GPU arrays. For more information, see Run MATLAB Functions on a GPU (Parallel Computing Toolbox).
Version History
Introduced in R2021a
See Also
openl3
| vggish
| vggishEmbeddings
| openl3Embeddings
| classifySound
| audioFeatureExtractor
MATLAB 命令
您点击的链接对应于以下 MATLAB 命令:
请在 MATLAB 命令行窗口中直接输入以执行命令。Web 浏览器不支持 MATLAB 命令。
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list:
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)