AI-powered brain implant breaks speed record for turning thoughts into text

We speak at a rate of about 160 words per minute. That speed is incredibly difficult to achieve for speech brain implants.

For decades, speech implants have used small electrode arrays inserted into the brain to measure neural activity, with the goal of transforming thoughts into text or sound. They are invaluable to people who lose the ability to speak due to paralysis, illness or other injuries. But they are also incredibly slow, reducing the number of words per minute almost tenfold. Like a slow-loading web page or audio file, the delay can become frustrating for everyday conversations.

A team led by Drs. Krishna Shenoy and Jaimie Henderson at Stanford University are closing that speed gap.

Published on the preprint server bioRxiv, their study helped a 67-year-old woman regain her ability to communicate with the outside world using brain implants at a record speed. Known as “T12”, the woman gradually lost her speech due to amyotrophic lateral sclerosis (ALS), or Lou Gehrig’s disease, which gradually robs the brain of its ability to control muscles in the body. T12 could still vocalize sounds when he tried to speak – but the words came out unintelligible.

With her implant, T12’s attempts at speech are now decoded in real time as text on a screen and spoken aloud by a computerized voice, including phrases like “it’s just tough” or “I like them coming.” The words came fast and furious at 62 per minute, over three times the speed of previous records.

It’s not just a need for speed. The study also made use of the largest vocabulary library used for speech decoding using an implant – at approximately 125,000 words – in a first demonstration at that scale.

To be clear, while it was a “major breakthrough” and reached “impressive new performance benchmarks” according to experts, the study has yet to be peer-reviewed and the results are limited to one participant.

That said, the underlying technology is not limited to ALS. The rise in speech recognition stems from a marriage between RNNs—recurrent neural networks, a machine learning algorithm previously effective at decoding neural signals—and language models. When tested further, the setup could pave the way for people with severe paralysis, stroke or locked-in syndrome to casually chat with loved ones using only their thoughts.

We are beginning to “approach the speed of natural conversation,” the authors said.

Loss for words

The team is no stranger to giving people back the ability to speak.

As part of BrainGate, a pioneering global collaboration to restore communication using brain implants, the team envisioned – and then realized – the possibility of restoring communication using neural signals from the brain.

In 2021, they constructed a brain-computer interface (BCI) that helped a person with spinal cord injury and paralysis with their mind. With 96 microelectrodes inserted into the motor areas of the patient’s brain, the team was able to decode brain signals for different letters while he imagined the movements to write each character, achieving a kind of “mindtexting” with over 94 percent accuracy.

The problem? The speed was about 90 characters per minute at most. While it was a big improvement from the previous setup, it was still painfully slow for everyday use.

So why not tap directly into the speech centers of the brain?

Regardless of language, decoding speech is a nightmare. Small and often subconscious movements of the tongue and surrounding muscles can trigger widely different clusters of sounds – also known as phonemes. Trying to connect the brain activity of every twitch of a facial muscle or flicker of the tongue to a sound is a Herculean task.

Hacking speech

The new study, part of the BrainGate2 Neural Interface System trial, used a clever solution.

The team first placed four strategically placed electrode microarrays in the outer layer of T12’s brain. Two were inserted into areas that control movements around the mouth’s surrounding facial muscles. The other two tapped directly into the brain’s “language center”, which is called Broca’s area.

In theory, the placement was a genius two-in-one: it captured both what the person wanted to say, and the actual performance of speech through muscle movements.

But it was also a risky proposition: we don’t yet know whether speech is limited to just a small brain area that controls the muscles around the mouth and face, or whether language is encoded on a more global scale inside the brain.

Enter RNNs. The algorithm, a type of deep learning, has previously translated neural signals from the motor areas of the brain into text. In a first test, the team found that it easily distinguished different types of facial movements for speech – such as frowning, puckering the lips or flicking the tongue – based on neural signals alone with over 92 percent accuracy.

The RNN was then trained to suggest phonemes in real time—for example, “he,” “ah,” and “tze.” Phenomena help distinguish one word from another; essentially, they are the basic element of speech.

The training took work: each day, T12 attempted to speak between 260 and 480 sentences at her own pace to teach the algorithm the particular neural activity underlying her speech patterns. In total, the RNN was trained on almost 11,000 sentences.

With a decoder for her mind, the team interfaced the RNN with two language models. One had a particularly large vocabulary of 125,000 words. The second was a smaller library of 50 words used for simple sentences in everyday life.

After five days of trying to speak, both language models were able to decode T12’s words. The system had errors: around 10 percent for the small library and almost 24 percent for the larger one. But when asked to repeat sentence prompts on a screen, the system easily translated her neural activity into sentences three times faster than previous models.

The implant worked regardless of whether she attempted to speak or simply uttered the sentences silently (she preferred the latter, as it required less energy).

By analyzing T12’s neural signals, the team found that certain areas of the brain retained neural signal patterns to code for vowels and other phonemes. In other words, even after years of speech paralysis, the brain still maintains a “detailed articulatory code”—that is, a dictionary of phonemes embedded in neural signals—that can be decoded using brain implants.

Say what’s on your mind

The study builds on many others who use a brain implant to restore speech, often decades after severe injuries or slowly spreading paralysis from neurodegenerative disorders. The hardware is familiar: the Blackrock microelectrode array, consisting of 64 channels for listening to the brain’s electrical signals.

What is different is how it works; that is, how the software transforms noisy neural chatter into coherent meanings or intentions. Previous models were largely based on decoding data directly obtained from neural recordings from the brain.

Here, the team made use of a new resource: language models or AI algorithms similar to the autocomplete feature now widely available for Gmail or text messages. The tech tag team is particularly promising with the emergence of GPT-3 and other new major language models. Excellent at generating speech patterns from simple prompts, the technology—combined with the patient’s own neural signals—can potentially “autocomplete” their thoughts without the need for hours of training.

The prospect, while enticing, comes with a side of caution. GPT-3 and similar AI models can generate persuasive speech on their own based on previous training data. For a person with paralysis who is unable to speak, we need handrails as AI generates what the person is trying to say.

The authors agree that their work is currently a proof of concept. Although promising, it is “not yet a complete, clinically viable system” for decoding speech. First, they said, we need to train the decoder with less time and make it more flexible so that it adapts to constantly changing brain activity. For another, the error rate of about 24 percent is far too high for daily use—although increasing the number of implant channels may increase accuracy.

But for now, it moves us closer to the ultimate goal of “restoring rapid communication to people with paralysis who can no longer speak,” the authors said.

Image credit: Miguel Á. Padriñán from Pixabay

Leave a Reply

Your email address will not be published. Required fields are marked *