A study by Peripheral has found another blind spot in Google’s AI Overviews, that they fail to correctly spell words backwards. In more than half of cases, the overviews gave an incorrect answer when asked how to reverse the spelling of a word.
In the study, we analysed 500 of the most common English terms, filtering by ones which were 5 letters or more. Using some variations of the query “How to spell the word X backwards”, we found that 52% of the time Overview output was confidently incorrect. Although the result usually had the letters of the initial word, they were often in the wrong order.

This issue impacted longer words more than shorter ones. Only a quarter of words with 7 letters or more were correctly spelled backwards.

Beyond the number of letters, terms with more syllables were also proportionally more affected by this issue. More than two-thirds of words of 2 syllables or fewer were correctly reversed. However, just 10% of terms with 3 syllables or more were correctly spelled, showing that AI Overviews struggled to understand the make-up of longer words.

The study also gave an insight into how Large Language Models (LLMs) produce outputs. Another finding was that the same query often produced inconsistent results. Often a search for the same term would produce different incorrect spellings, showing how LLMs generate responses in a non-deterministic manner rather than following a set scientific process.

Summary of AI Overviews
AI Overviews are the generated summaries which produce an automated response to a Google Search query. They take information from linked sources, and are capable of answering objective and subjective queries. After launch in the US in May 2024, Google expanded them throughout the globe and are now available in over 200 territories around the world.
Over time we have also seen a greater number of organic queries with an AI Overview response- up to 30% according to a study by SEO Platform Authoritas.
Prevailing issues with ai overviews
The recent widespread prevalence of LLMs is part of the arms race between big tech firms, to try to be the first to successfully integrate AI into their software. The sudden success of ChatGPT after launch in 2022 was the catalyst for other companies like Microsoft and Bing to integrate AI into their own software, especially within search.
However, the pace that this went at led to some accuracy issues. After launch, Google’s AI Overviews suggested putting glue on pizza to keep cheese from sliding off, and also that geologists recommend that people should eat one rock per day.
In a statement, Google’s Head of Search Liz Reid pointed out that these kinds of results came from a small number of rarely searched queries, and that they had updated the model to avoid these issues.
There have been more recent exploitations, when searchers realised they could convince the AI Overview to falsely explain the meaning of sayings that didn’t exist. Most likely because the overviews had been trained to be as helpful as possible in answering queries, Google was able to incorrectly explain the origin of fake phrases such as “You Can’t Like a Badger Twice”.
Although these anomalies have been patched up, issues with the AI Overviews still persist.
Tokenisation is one factor
The likely reason for misspellings and similar incorrect responses relates back to how AI models process information. LLMs are next-word predictors, they use training data to understand what the most likely next word is in a sentence. This means that they process in chunks or tokens, rather than letters, a concept called tokenisation.
This means that they see words as multiple tokens rather than a sequence of letters, and therefore to reverse the word, they make an effort to reverse the tokens, which leads to errors.
But why are they still wrong?
However, this doesn’t explain the full picture. If tokenisation was the only cause for this, then wrongly spelt words would have tokens out of order, but other than that be correct. But some of the misspellings are just nonsensical.

Another factor which causes these mistakes could be a lack of training data. Something like spelling words backwards may not come up often in the data, which forces the LLM to attempt to use a less relevant part of their training, which produces the wrong response.
We’ve seen this concept before. AI struggled to state the number of Rs in the word strawberry, and this was also due to models not understanding how to spell, focussing on the tokens and chunks in the word.

Although developers can patch certain bugs, being a new software, there will always be edge cases which users encounter that training data hasn’t prepared the models for.
Future of AI Overviews
Recently Google has vowed to take its AI search experience further. It recently released AI Mode, an AI search tool similar to other LLMs where users can search queries alongside traditional search. It will use multiple queries to give a complete answer to a search, whilst giving the user opportunity for further enquiries and discussion.
As a result, AI Mode may patch current AI errors as results will be better sourced. However, it could also lead to its own set of problems and hallucinations.
Image data from Peripheral. Screenshots taken from searches on Google.com.
Leave a Reply