identifyLanguage

Identify languages in speech signals

Since R2024b

collapse all in page

Syntax

language = identifyLanguage(audioIn,fs)

language = identifyLanguage(audioIn,fs,LanguageIDFormat=format)

[language,score] = identifyLanguage(___)

[language,score,results] = identifyLanguage(___)

identifyLanguage(___)

Description

language = identifyLanguage(audioIn,fs) returns the language identified in the given speech signal.

This function requires Deep Learning Toolbox™.

example

language = identifyLanguage(audioIn,fs,LanguageIDFormat=format) specifies the format of the returned language identification.

example

[language,score] = identifyLanguage(___) also returns the confidence score associated with the language identification.

example

[language,score,results] = identifyLanguage(___) also returns a table containing all languages the pretrained network can identify, their respective scores, and the ISO language codes.

example

identifyLanguage(___) with no output arguments plots a bar graph of the top 5 highest-scoring languages.

example

Examples

collapse all

Download `identifyLanguage` Functionality

Open Live Script

Try calling identifyLanguage in the command line. If the required model files are not installed, then the function throws an error and provides a link to download them. Click the link, and unzip the file to a location on the MATLAB path.

Alternatively, execute the following commands to download and unzip the identifyLanguage model files to your temporary directory.

downloadFolder = fullfile(tempdir,"identifyLanguageDownload");
loc = websave(downloadFolder,"https://ssd.mathworks.com/supportfiles/audio/lang-id-voxlingua107-ecapa-weights.zip");
modelsLocation = tempdir;
unzip(loc,modelsLocation)
addpath(fullfile(modelsLocation,"lang-id-voxlingua107-ecapa-weights"))

Identify Languages from Speech Signals

Open Live Script

Read in an audio signal containing English speech and use identifyLanguage to identify the language spoken.

[x,fs] = audioread("CleanSpeech-16-mono-3secs.ogg");
lang = identifyLanguage(x,fs)

lang = 
"english"

Read in another signal containing a phrase in Polish and identify the language.

[x,fs] = audioread("polish.wav");
lang = identifyLanguage(x,fs)

lang = 
"polish"

Call identifyLanguage with no output arguments to plot the top 5 detected languages and their scores.

identifyLanguage(x,fs)

Figure contains an axes object. The axes object with title Languages Detected (Top 5), ylabel Network Score contains an object of type bar.

Get ISO 639 Code of Identified Language

Open Live Script

Read in an audio signal containing English speech and use identifyLanguage with LanguageIDFormat set to "ISO-639" to get the ISO code of the identified language.

[x,fs] = audioread("CleanSpeech-16-mono-3secs.ogg");
lang = identifyLanguage(x,fs,LanguageIDFormat="ISO-639")

lang = 
"en"

Get Confidence Scores of Language Identifications

Open Live Script

Read in an audio signal containing English speech and use identifyLanguage to identify the language spoken and get the confidence score of the identification. See the high confidence in this prediction.

[x,fs] = audioread("CleanSpeech-16-mono-3secs.ogg");
[lang,score] = identifyLanguage(x,fs)

lang = 
"english"

score = single

0.9998

Read in another signal containing English. Use identifyLanguage to get the language identification, the confidence score, and a table with the results for all supported languages. See how the language is correctly identified but the confidence is lower, likely due to the limited vocabulary and sparsity of speech in the signal.

[x,fs] = audioread("Counting-16-44p1-mono-15secs.wav");
[lang,score,results] = identifyLanguage(x,fs)

lang = 
"english"

score = single

0.3289

results=107×3 table
    LanguageName    LanguageCode      Score  
    ____________    ____________    _________

    "english"           "en"          0.32889
    "albanian"          "sq"          0.14848
    "swedish"           "sv"          0.12257
    "latin"             "la"          0.11784
    "maltese"           "mt"          0.06679
    "arabic"            "ar"         0.047516
    "yiddish"           "yi"         0.047516
    "bosnian"           "bs"          0.02298
    "croatian"          "hr"         0.014139
    "slovenian"         "sl"         0.010902
    "welsh"             "cy"         0.010865
    "korean"            "ko"        0.0078524
    "hebrew"            "he"        0.0072787
    "afrikaans"         "af"        0.0066797
    "tagalog"           "tl"         0.005378
    "lao"               "lo"        0.0050288
      ⋮

Input Arguments

collapse all

`audioIn` — Speech signal
column vector

Speech signal, specified as a column vector. The minimum duration of the speech signal is 0.5 seconds.

Data Types: single | double

`fs` — Sample rate (Hz)
scalar

Sample rate in Hz, specified as a scalar. The identifyLanguage function requires a sample rate of at least 4000 Hz.

Data Types: single | double

`format` — Format of language identification
`"english-name"` (default) | `"ISO-639"`

Format of language identification returned by identifyLanguage, specified as "english-name" or "ISO-639".

"english-name" — identifyLanguage returns the language as a string containing the common English-language name for the language, such as "spanish" or "japanese".
"ISO-639" — identifyLanguage returns language as a string containing the two letter ISO 639-1 code for the language. If the language does not have an ISO 639-1 code, then the function returns the three letter ISO 639-2 code.

Data Types: char | string

Output Arguments

collapse all

`language` — Language identified
string scalar

Language identified in the speech signal, returned as a string. The format of the returned language identification is specified by format.

`score` — Score for identified language
scalar

Score for the identified language, returned as a single scalar. This score can be interpreted as confidence in the language identification.

`results` — All language identification results
table

All language identification results from the speech input, returned as a table containing all scores and corresponding languages and language codes. The table contains the variables LanguageName, LanguageCode, and Score. The rows are sorted in descending order using the score variable.

Algorithms

The identifyLanguage function uses an ECAPA-TDNN[1] model to identify languages. This neural network uses pretrained weights from the lang-id-voxlingua107-ecapa model provided by SpeechBrain[2].

References

[1] Desplanques, Brecht, Jenthe Thienpondt, and Kris Demuynck. “ECAPA-TDNN: Emphasized Channel Attention, Propagation and Aggregation in TDNN Based Speaker Verification.” In Interspeech 2020, 3830–34. ISCA, 2020. https://doi.org/10.21437/Interspeech.2020-2650.

[2] Ravanelli, Mirco, et al. SpeechBrain: A General-Purpose Speech Toolkit. arXiv, 8 June 2021. arXiv.org, http://arxiv.org/abs/2106.04624

Extended Capabilities

GPU Arrays
Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.

Version History

Introduced in R2024b

identifyLanguage

Syntax

Description

Examples

Download identifyLanguage Functionality

Identify Languages from Speech Signals

Get ISO 639 Code of Identified Language

Get Confidence Scores of Language Identifications

Input Arguments

audioIn — Speech signal column vector

fs — Sample rate (Hz) scalar

format — Format of language identification "english-name" (default) | "ISO-639"

Output Arguments

language — Language identified string scalar

score — Score for identified language scalar

results — All language identification results table

Algorithms

References

Extended Capabilities

GPU Arrays Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.

Version History

See Also

Download `identifyLanguage` Functionality

`audioIn` — Speech signal
column vector

`fs` — Sample rate (Hz)
scalar

`format` — Format of language identification
`"english-name"` (default) | `"ISO-639"`

`language` — Language identified
string scalar

`score` — Score for identified language
scalar

`results` — All language identification results
table

GPU Arrays
Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.