Audio to Mel Spectrogram

Hello I am working on sound classification problem. my task is to create mel spectrogram with three different windows length 93ms and 46ms and 23ms this is achieved by keeping n_fft to 2048,1024 and 512 respectively. I am getting (128,216) but I don't understand the 3 there (128,216,3) here 128 is number of frequency bins and 216 are number of frames. Can some help me understand the right side the attached image the DL part?

2 个评论

You have 3 time windows , so you are omputing 3 spectrograms, each one is an array size 128 x 216
at the end your 3 spectrograms are stored in a 3D array, size 128 x 216 x 3
Thanks for your feedback.
is my code doing correctly? this is what the image says?
import librosa
import numpy as np
# Load the audio file
y, sr = librosa.load(r'G:\A NEW RESEARCH DATASET\1Fire\2_Fire.wav') # Replace 'path_to_your_audio_file.wav' with your audio file path
# List of n_fft values
n_ffts = [2048, 1024, 512]
# List to hold spectrograms
spectrograms = []
#Generate spectrograms for each n_fft value
for n_fft in n_ffts:
mel_spec = librosa.feature.melspectrogram(y=y, sr=sr, n_fft=n_fft, hop_length=512, n_mels=128)
spectrograms.append(mel_spec)
# Stack the spectrograms along the third dimension
tensor = np.stack(spectrograms, axis=-1)
print(tensor.shape) # This should print (90, time_steps, 3), where time_steps depends on the length of your audio file

请先登录,再进行评论。

回答(0 个)

类别

帮助中心File Exchange 中查找有关 Simulation, Tuning, and Visualization 的更多信息

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by