UT Austin Researchers Convert Brain Signals to Words and Phrases Using Wavelets and Deep Learning
Create a speech-driven brain-computer interface to enable ALS patients to communicate by imagining the act of speaking specific phrases
Use wavelet scalograms of MEG signals to train deep neural networks
- Classification accuracy of 96% achieved
- Wavelets and deep learning networks quickly combined
- Training times accelerated by a factor of 10
For patients with advanced amyotrophic lateral sclerosis (ALS), communication becomes increasingly difficult as the disease progresses. In many cases, ALS (also known as Lou Gehrig’s disease) leads to locked-in syndrome, in which a patient is completely paralyzed but remains cognitively intact. Eye tracking devices, and more recently, electroencephalogram (EEG)-based brain-computer interfaces (BCIs), enable ALS patients to communicate by spelling phrases letter by letter, but it can take several minutes to communicate even a simple message.
Magnetoencephalography (MEG) is a noninvasive technique that detects magnetic activity produced by electrical signals occurring naturally in the brain. Researchers at University of Texas at Austin have developed a noninvasive technology that uses wavelets and deep neural networks to decode MEG signals and detect entire phrases as the patient imagines speaking them. MATLAB® enabled the researchers to combine wavelet-based signal processing approaches with a variety of machine learning and deep learning techniques.
“We need to be able to try an approach, visualize the results, and then retrace our steps or try something new if it’s not working well,” says Debadatta Dash, doctoral student in the UT Austin Speech Disorders and Technology Lab. “In another programming language, those iterations can be time-consuming, but with MATLAB we can use extensive signal processing libraries along with toolboxes to rapidly evaluate new ideas and immediately see how well they work.”
The goal of the project was to classify brain signals corresponding to imagined or spoken phrases. The UT Austin team, including Dr. Paul Ferrari, a neuroscientist and research director of the MEG lab at the Dell Children’s Medical Center, wanted to use MEG neuroimaging modality to capture the brain signals because MEG has greater spatial resolution than EEG and greater temporal resolution than functional magnetic resonance imaging (fMRI). To improve the overall MEG signal quality, they needed to remove noise while preserving the overall signal characteristics. In addition to preprocessing and denoising hundreds of signals from more than 1000 test trials, the team needed to analyze and visualize the signals.
Because the researchers were working with a new kind of data, they required a tool that would enable them to rapidly evaluate a variety of approaches, including deep learning.
The UT Austin researchers used MATLAB to derive whole phrases from MEG signals as a first step toward developing a brain-computer interface that would enable ALS patients to communicate.
With Wavelet Toolbox™, they denoised and decomposed the MEG signals to specific neural oscillation bands (high gamma, gamma, alpha, beta, theta, and delta brain waves) by using wavelet multiresolution analysis techniques.
The researchers then extracted features from the denoised and decomposed signals. They used Statistics and Machine Learning Toolbox™ to calculate a variety of statistical features, including mean, median, standard deviation, quartiles, and root mean square. They used the extracted features to train a support vector machine (SVM) classifier and a shallow artificial neural network (ANN) classifier, obtaining an accuracy baseline by classifying neural signals corresponding to five phrases.
To obtain and represent the rich MEG signal features in the time-frequency domain, they used wavelet scalograms of MEG signals as input to a convolutional neural network. (A wavelet scalogram captures how spectral components in a signal evolve as a function of time.) The team customized three pretrained deep convolutional neural networks—AlexNet, ResNet, and Inception-ResNet—for speech decoding MEG signals. All yielded high overall accuracy for multiple subjects. To speed up training, the team conducted the training on a seven-GPU parallel computing server using Parallel Computing Toolbox™.
The UT Austin researchers have published their findings and are now working on the next steps in their research: extending the number of classified phrases from five to hundreds, decoding speech at the phoneme level, and converting MEG signals directly to synthesized speech.
- Classification accuracy of 96% achieved. “The SVM and ANN approaches we tried yielded a classification accuracy of about 80%, but when we combined wavelets and deep learning, we saw that increase to more than 96%,” says Dr. Jun Wang, associate professor of Communication Sciences & Disorders and Neurology and director of the Speech Disorders and Technology Lab at UT Austin.
- Wavelet techniques and deep learning networks quickly combined. “With MATLAB, it took us just minutes to implement scalograms for deep learning networks,” says Dash. “Of course, training and interpreting results takes additional time, but I completed the implementation of AlexNet, for example, in a matter of minutes—significantly less time than I would have needed with another programming language.”
- Training times accelerated by a factor of 10. “To switch from training on a single worker to training across multiple GPUs, we only had to change one line of MATLAB code,” says Dash. “With Parallel Computing Toolbox and a server with seven GPUs, that small change enabled us to train networks about 10 times faster.”