Speech to text model for a non-English language.

Question

Murtaza Mohammadi 2024-9-22

0
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/2154620-speech-to-text-model-for-a-non-english-language

评论： Umar 2024-9-24

Hello

I want to develop a tool that can transcribe a non-English audio and using the letters of that language itself. What I have gathered so far is that I need to create some labelled data and train a deep learning model. Beyound that I am unaware how to proceed. All online examples and discussions pertain to an existing English dataset or a trained model on the English language. I would like to develop something for a regional dialect which uses a different alphabet system.

Looking for some detailed help and guidance here.

Thank you.

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

请先登录，再回答此问题。

Answer 1

Umar 2024-9-22

0
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/2154620-speech-to-text-model-for-a-non-english-language#answer_1520740

Hi @Murtaza Mohammadi ,

The first step in developing your transcription tool is to gather a dataset of audio recordings in the target language. This dataset should include:

Audio Files: Recordings of spoken language in various contexts (e.g., conversations, speeches).

Transcriptions: Text files that contain the corresponding transcriptions of the audio files in the target alphabet.

You may need to create this dataset manually or find existing resources. Ensure that the audio quality is high and that the recordings cover a diverse range of speakers and dialects. Once you have your audio files, you need to label them. This involves creating a mapping between the audio and its corresponding text. You can use a simple CSV format for this purpose:

audio_file, transcription
audio1.wav, "transcription in target alphabet"
audio2.wav, "another transcription"

So, before you train a model, you must preprocess the audio data. This typically involves:

Resampling: Ensure all audio files are at the same sample rate.

Feature Extraction: Convert audio signals into a format suitable for model training, such as Mel-frequency cepstral coefficients (MFCCs).

Here’s a MATLAB code snippet to extract MFCC features from an audio file: language-matlab

[audioIn, fs] = audioread('audio1.wav'); % Read audio file
audioIn = resample(audioIn, 16000, fs); % Resample to 16 kHz
coeffs = mfcc(audioIn, 16000); % Extract MFCC features

For more information on these functions, please refer to

https://www.mathworks.com/help/matlab/import_export/read-and-get-information-about-audio-files.html

https://www.mathworks.com/help/signal/ref/resample.html?searchHighlight=resample&s_tid=srchtitle_support_results_1_resample

https://www.mathworks.com/help/audio/ref/mfcc.html?searchHighlight=mfcc&s_tid=srchtitle_support_results_1_mfcc

Now for transcription tasks, I will recommend using recurrent neural networks (RNNs) or convolutional neural networks (CNNs) which are commonly used. You can also consider using Long Short-Term Memory (LSTM) networks, which are effective for sequence prediction problems.Here’s a simple example of defining an LSTM network in MATLAB:

layers = [
  sequenceInputLayer(13) % Input layer for MFCC features
  lstmLayer(100, 'OutputMode', 'sequence') % LSTM layer
  fullyConnectedLayer(numClasses) % Output layer for classes
  softmaxLayer
  classificationLayer];

For more information on lstm layer, please refer to

https://www.mathworks.com/help/deeplearning/ref/nnet.cnn.layer.lstmlayer.html?searchHighlight=lstmLayer&s_tid=srchtitle_support_results_1_lstmLayer

Once your model is defined, then you can train it using the labeled data. Use the trainNetwork function in MATLAB:

options = trainingOptions('adam', ...
  'MaxEpochs', 100, ...
  'MiniBatchSize', 32, ...
  'Verbose', 0, ...
  'Plots', 'training-progress');

net = trainnet(trainingData, layers, options);

For more information on trainnet, please refer to

https://www.mathworks.com/help/deeplearning/ref/trainnet.html

After training, evaluate your model's performance using a separate test dataset. Calculate metrics such as accuracy, precision, and recall to assess how well your model transcribes audio. Now that you are satisfied with the model's performance, you can deploy it as a standalone application or integrate it into a larger system. Consider using MATLAB's App Designer to create a user-friendly interface for your transcription tool. For more information on App Designer, please refer to

https://www.mathworks.com/help/matlab/ref/appdesigner.html?searchHighlight=App%20designer&s_tid=srchtitle_support_results_1_App%20designer

Hope, this should help you get started with your project. Please let me know if you have any further questions.

2 个评论
显示无隐藏无

Murtaza Mohammadi 2024-9-23

Hi @Umar

Thanks for your detailed response. I will get going on this and keep you posted.

Umar 2024-9-24

Hi @ Murtaza Mohammadi,

Thank you for your prompt acknowledgment. I appreciate your commitment to moving forward with this matter. Please feel free to reach out if you have any questions or require further assistance as you proceed. I look forward to hearing from you soon.

请先登录，再进行评论。

Speech to text model for a non-English language.

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

采纳的回答

2 个评论
显示无隐藏无

更多回答（0 个）

另请参阅

类别

标签

产品

版本

Community Treasure Hunt

Speech to text model for a non-English language.

0 个评论 显示 -2更早的评论隐藏 -2更早的评论

采纳的回答

2 个评论 显示 无隐藏 无

更多回答（0 个）

另请参阅

类别

标签

产品

版本

Community Treasure Hunt

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

2 个评论
显示无隐藏无