25 April 2019

Helping the paralyzed

Two neural networks have transformed brain signals into human speech

But it's still not easy to disassemble it

Evgenia Shcherbina, "The Attic"

Neuroscientists from the University of California at San Francisco have come up with a system that converts brain signals into speech sounds. To do this, they used two neural networks and a speech synthesizer. Such a system, the authors believe, after improvement can help to speak paralyzed.

Due to various kinds of paralysis, some patients lose the ability to speak. One of the most famous such patients was Stephen Hawking, who suffered from amyotrophic lateral sclerosis. The scientist used a special expensive speech synthesizer, which was made especially for him.

Similar interfaces are often based, for example, on the movements of the patient's head or eyes. With these movements, patients control the cursor and thus select letters on the screen. However, such communication is much slower than natural speech. Therefore, experts are working in another direction, creating brain-computer interfaces that could directly read brain signals and convert them into words.

Just such an interface of two recurrent neural networks and a speech synthesizer was created by American researchers. Such a two-stage system is needed because the cerebral cortex does not deal directly with words. It coordinates only the movements of the speech organs that pronounce these words, which is why scientists resorted to double decryption.

Article by Anumanchipalli et al. Speech synthesis from neural decoding of spoken sentences is published in the journal Nature.

To get the initial data for the work, the scientists used five patients who were treated for epilepsy. During therapy, they spoke aloud several hundred sentences, and during this, scientists read electrical signals from the surface of their cerebral cortex using electrocorticography.

Speech.jpg

A set of electrodes with which signals were read from the speech centers of the patients' brains, which were subsequently converted into speech.

Then two neural networks came into play. The first one transformed these electrical signals into kinematic ones, that is, into signals about how the tongue, lips and larynx should move. The second transformed kinematic signals into acoustic characteristics: pitch, frequency, prosody (non-speech components, such as pauses between words) and other parameters. Then, based on these acoustic characteristics, the speech synthesizer created an audio signal.

The scientists gave the words and whole sentences synthesized in this way to listen to volunteers who were recruited on the Amazon Mechanical Turk website. They had to write down what they were hearing by choosing pre-suggested words from a list of 25 or 50 words. A total of 1,755 people participated in the test, who performed 16 different tasks. They correctly parsed 43% of the words in the case when they had to choose from 25, and 21% of the words if they had to choose from 50 options. As in the case of ordinary live speech, it was easier to understand a word the longer it was.

The scientists conducted further tests of their decoder on one person who not only pronounced words aloud, but also simply repeated facial movements without saying anything. In the second case, the system also coped with deciphering words, although worse than when people spoke out loud.

Although speech decoding using the new system is still far from ideal, scientists believe that they have made progress in creating devices that will decrypt speech directly from the brain in real time, which will allow paralyzed patients to communicate at a natural pace and also transmit intonations and other elements of speech that are not available when typing on a monitor screen.

Portal "Eternal youth" http://vechnayamolodost.ru


Found a typo? Select it and press ctrl + enter Print version