Extract audio features
Extract and Normalize Audio Features
Read in an audio signal.
[audioIn,fs] = audioread("Counting-16-44p1-mono-15secs.wav");
audioFeatureExtractor to extract the centroid of the Bark spectrum, the kurtosis of the Bark spectrum, and the
pitch of an audio signal.
aFE = audioFeatureExtractor("SampleRate",fs, ... "SpectralDescriptorInput","barkSpectrum", ... "spectralCentroid",true, ... "spectralKurtosis",true, ... "pitch",true)
aFE = audioFeatureExtractor with properties: Properties Window: [1024x1 double] OverlapLength: 512 SampleRate: 44100 FFTLength:  SpectralDescriptorInput: 'barkSpectrum' FeatureVectorLength: 3 Enabled Features spectralCentroid, spectralKurtosis, pitch Disabled Features linearSpectrum, melSpectrum, barkSpectrum, erbSpectrum, mfcc, mfccDelta mfccDeltaDelta, gtcc, gtccDelta, gtccDeltaDelta, spectralCrest, spectralDecrease spectralEntropy, spectralFlatness, spectralFlux, spectralRolloffPoint, spectralSkewness, spectralSlope spectralSpread, harmonicRatio, zerocrossrate, shortTimeEnergy To extract a feature, set the corresponding property to true. For example, obj.mfcc = true, adds mfcc to the list of enabled features.
extract to extract the features from the audio signal. Normalize the features by their mean and standard deviation.
features = extract(aFE,audioIn); features = (features - mean(features,1))./std(features,,1);
Plot the normalized features over time.
idx = info(aFE); duration = size(audioIn,1)/fs; subplot(2,1,1) t = linspace(0,duration,size(audioIn,1)); plot(t,audioIn) subplot(2,1,2) t = linspace(0,duration,size(features,1)); plot(t,features(:,idx.spectralCentroid), ... t,features(:,idx.spectralKurtosis), ... t,features(:,idx.pitch)); legend("Spectral Centroid","Spectral Kurtosis", "Pitch") xlabel("Time (s)")
audioIn — Input audio
column vector | matrix
Input audio, specified as a column vector or matrix of independent channels (columns).
features — Extracted audio features
vector | matrix | 3-D array
Extracted audio features, returned as an L-by-M-by-N array, where:
L –– Number of feature vectors (hops)
M –– Number of features extracted per analysis window
N –– Number of channels
Introduced in R2019b