ChatGPT also has its own language quirks. Discover the words most used by OpenAI’s chatbot.
At the last census in December 2024, ChatGPT had over 300 million weekly users and nearly 3 billion visits. Throughout these discussions, the language model exhibits peculiarities in its writing style. In particular, the chatbot displays a tendency to overuse certain words compared to everyday language.
What are the most used words by ChatGPT?
ChatGPT tends to frequently use connectors like “therefore”, “however”, “furthermore” or “henceforth” which structure its responses in a rigorous manner but which at the same time weigh down the discourse.
It also displays a predilection for sophisticated vocabulary, regularly incorporating complex terms such as “exacerbating” or “interoperability.” This may be explained by the fact that the chatbot trains by collecting millions of data points it finds on the internet, including scientific texts that use specific vocabulary.
ChatGPT also favors highly structured responses, especially when faced with complex questions. It often adopts a structure similar to that of an essay, with different sections that nuance its arguments, thus preventing it from making a clear decision.
But what are the words ChatGPT uses the most? Jordan Gibbs, a journalist for Medium, tested it by giving the chatbot code that allowed it to write freely across 500 topics. He ended up with a file containing all the words used by the AI and their frequency of use. Here are the words most generated by ChatGPT (translated from English):
- the
- of
- And
- a/an
- has
- In
- that/who
- with
- East
- as
So far, nothing particularly surprising, but Jordan Gibbs did notice a few surprises in the rest of the rankings. For example, she appears in 17th place in the ChatGPT data, while the word only appears in 139th place in the web data.
Even more surprising, and certainly more interesting, is that ChatGPT uses very specific terms to an excessive extent compared to humans. Among the words most overused by ChatGPT:
- Reimagined: This term is, according to Medium’s analysis, the most overused term ChatGPT uses. It was reportedly used 1,033 times more often than actual human-generated text.
- Bioluminescent: 650 times more used by ChatGPT than by humans,
- Verdant (verdant): 600 times more used,
- Graphene (graphene): 400 times more used,
- Animated/active (bustling): 380 times more used,
- Cannot: 380 times more used,
- Dig/excavate (delve): 370 times more used,
- Twinkled: 360 times more used,
- Tirelessly: 350 times more used,
- Intertwine: 350 times more used.
What are the most used phrases by ChatGPT?
ChatGPT also has overused phrases compared to human writing. Among the AI’s favorite phrases are:
- Dive into the details…
- It is important to note…
- As we have seen…
- It is crucial to understand…
- In a world that is evolving at a frantic pace…
Find all the phrases favored by ChatGPT
Why do certain words and phrases keep coming up?
Language models like GPT-4 are built from a massive text database that allows it to statistically predict the next word in a sentence. But in its raw state, a large language model (LLM) is difficult to use and requires human intervention to perfect it. This step is called Reinforcement Through Human Feedback (RLHF) and involves having humans test and evaluate the language model’s responses.
This intervention is crucial and requires hundreds of thousands of hours of work. To reduce costs, companies like OpenAI often outsource this work to regions where labor is less expensive, particularly in certain African countries.
The workers who trained their system therefore provided examples of input and output using the same language, which ultimately resulted in an AI system that writes much like English spoken in Africa, the Guardian notes.