Hi Aviral,
I understand that you want to build your own speech-to-text-model from scratch in MATLAB. If you don't have a model ready for implementation in MATLAB, you may want to refer to some of the existing state-of-the-art speech recognition model and convert their code to MATLAB, Following is a good resource to learn about different speech recognition model - https://paperswithcode.com/task/speech-recognition
Additionally, if you are interested the default model which the MATLAB function "speech2text" uses is the following -Baevski, Alexei, Henry Zhou, Abdelrahman Mohamed, and Michael Auli. “Wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations,” 2020. https://doi.org/10.48550/ARXIV.2006.11477.
Hope this answers your question
Regards,
Vinayak Luha