Extract OpenL3 embeddings
Audio Toolbox / Deep Learning
The OpenL3 Embeddings block uses OpenL3 to extract feature embeddings from audio signals. The OpenL3 Embeddings block combines necessary audio preprocessing and OpenL3 network inference and returns feature embeddings that are a compact representation of audio data. This block requires Deep Learning Toolbox™.
Port_1 — Sound data
Sound data, specified as a one-channel signal (column vector). If Sample rate of input signal (Hz) is 48e3, there are no restrictions on the input frame length. If Sample rate of input signal (Hz) is different from 48e3, then the input frame length must be a multiple of the decimation factor of the resampling operation that the block performs. If the input frame length does not satisfy this condition, the block throws an error message with information on the decimation factor.
Port_1 — Embedding
Output embedding, returned as a row vector whose length is specified by the Embedding length parameter.
Sample rate of input signal (Hz) — Sample rate of input signal in Hz
48e3 (default) | positive scalar
Sample rate of the input signal in Hz, specified as a positive scalar.
Overlap percentage (%) — Overlap percentage between consecutive spectrograms
90 (default) | [0 100)
Specify the overlap percentage between consecutive spectrograms as a scalar in the range [0 100).
Spectrum type — Type of spectrum
Mel (128 bands) (default) |
Mel (256 bands) |
Type of spectrum generated from audio and used as input to the neural network,
Mel (128 bands),
Mel (128 bands)–– The neural network accepts mel spectrograms generated from the input audio with 128 mel bands.
Mel (256 bands)–– The neural network accepts mel spectrograms generated from the input audio with 256 mel bands.
Linear–– The neural network accepts positive one-sided spectrograms generated from the input audio with an FFT length of 257.
Content type — Type of audio content
Environmental sounds (default) |
Type of audio content the neural network was trained on, specified as
Environmental sounds or
sounds. Set this parameter to
sounds to use a neural network pretrained on environmental audio data,
and set it to
Musical sounds to use a network pretrained on
Embedding length — Output embedding length
512 (default) |
Length of output embedding, specified as
 Cramer, Jason, et al. "Look, Listen, and Learn More: Design Choices for Deep Audio Embeddings." In ICASSP 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, 2019, pp. 3852-56. DOI.org (Crossref), doi:/10.1109/ICASSP.2019.8682475.
C/C++ Code Generation
Generate C and C++ code using Simulink® Coder™.
Usage notes and limitations:
To generate generic C code that does not depend on third-party libraries, in the Configuration Parameters > Code Generation general category, set the Language parameter to
To generate C++ code, in the Configuration Parameters > Code Generation general category, set the Language parameter to
C++. To specify the target library for code generation, in the Code Generation > Interface category, set the Target Library parameter. Setting this parameter to
Nonegenerates generic C++ code that does not depend on third-party libraries.
For ERT-based targets, the Support: variable-size signals parameter in the Code Generation> Interface pane must be enabled.
For a list of networks and layers supported for code generation, see Networks and Layers Supported for Code Generation (MATLAB Coder).
Introduced in R2022b