Speech to text model for a non-English language.

10 次查看(过去 30 天)
Hello
I want to develop a tool that can transcribe a non-English audio and using the letters of that language itself. What I have gathered so far is that I need to create some labelled data and train a deep learning model. Beyound that I am unaware how to proceed. All online examples and discussions pertain to an existing English dataset or a trained model on the English language. I would like to develop something for a regional dialect which uses a different alphabet system.
Looking for some detailed help and guidance here.
Thank you.

采纳的回答

Umar
Umar 2024-9-22

Hi @Murtaza Mohammadi ,

The first step in developing your transcription tool is to gather a dataset of audio recordings in the target language. This dataset should include:

Audio Files: Recordings of spoken language in various contexts (e.g., conversations, speeches).

Transcriptions: Text files that contain the corresponding transcriptions of the audio files in the target alphabet.

You may need to create this dataset manually or find existing resources. Ensure that the audio quality is high and that the recordings cover a diverse range of speakers and dialects. Once you have your audio files, you need to label them. This involves creating a mapping between the audio and its corresponding text. You can use a simple CSV format for this purpose:

audio_file, transcription
audio1.wav, "transcription in target alphabet"
audio2.wav, "another transcription"

So, before you train a model, you must preprocess the audio data. This typically involves:

Resampling: Ensure all audio files are at the same sample rate.

Feature Extraction: Convert audio signals into a format suitable for model training, such as Mel-frequency cepstral coefficients (MFCCs).

Here’s a MATLAB code snippet to extract MFCC features from an audio file: language-matlab

[audioIn, fs] = audioread('audio1.wav'); % Read audio file
audioIn = resample(audioIn, 16000, fs); % Resample to 16 kHz
coeffs = mfcc(audioIn, 16000); % Extract MFCC features

For more information on these functions, please refer to

https://www.mathworks.com/help/matlab/import_export/read-and-get-information-about-audio-files.html

https://www.mathworks.com/help/signal/ref/resample.html?searchHighlight=resample&s_tid=srchtitle_support_results_1_resample

https://www.mathworks.com/help/audio/ref/mfcc.html?searchHighlight=mfcc&s_tid=srchtitle_support_results_1_mfcc

Now for transcription tasks, I will recommend using recurrent neural networks (RNNs) or convolutional neural networks (CNNs) which are commonly used. You can also consider using Long Short-Term Memory (LSTM) networks, which are effective for sequence prediction problems.Here’s a simple example of defining an LSTM network in MATLAB:

layers = [
  sequenceInputLayer(13) % Input layer for MFCC features
  lstmLayer(100, 'OutputMode', 'sequence') % LSTM layer
  fullyConnectedLayer(numClasses) % Output layer for classes
  softmaxLayer
  classificationLayer];

For more information on lstm layer, please refer to

https://www.mathworks.com/help/deeplearning/ref/nnet.cnn.layer.lstmlayer.html?searchHighlight=lstmLayer&s_tid=srchtitle_support_results_1_lstmLayer

Once your model is defined, then you can train it using the labeled data. Use the trainNetwork function in MATLAB:

options = trainingOptions('adam', ...
  'MaxEpochs', 100, ...
  'MiniBatchSize', 32, ...
  'Verbose', 0, ...
  'Plots', 'training-progress');
net = trainnet(trainingData, layers, options);

For more information on trainnet, please refer to

https://www.mathworks.com/help/deeplearning/ref/trainnet.html

After training, evaluate your model's performance using a separate test dataset. Calculate metrics such as accuracy, precision, and recall to assess how well your model transcribes audio. Now that you are satisfied with the model's performance, you can deploy it as a standalone application or integrate it into a larger system. Consider using MATLAB's App Designer to create a user-friendly interface for your transcription tool. For more information on App Designer, please refer to

https://www.mathworks.com/help/matlab/ref/appdesigner.html?searchHighlight=App%20designer&s_tid=srchtitle_support_results_1_App%20designer

Hope, this should help you get started with your project. Please let me know if you have any further questions.

  2 个评论
Murtaza Mohammadi
Murtaza Mohammadi 2024-9-23
Thanks for your detailed response. I will get going on this and keep you posted.
Umar
Umar 2024-9-24
Hi @ Murtaza Mohammadi,
Thank you for your prompt acknowledgment. I appreciate your commitment to moving forward with this matter. Please feel free to reach out if you have any questions or require further assistance as you proceed. I look forward to hearing from you soon.

请先登录,再进行评论。

更多回答(0 个)

类别

Help CenterFile Exchange 中查找有关 Sequence and Numeric Feature Data Workflows 的更多信息

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by