Language may be the most complex behavior created by humans. Teaching machines how to understand the intricate rules (and numerous exceptions to these rules) of human language has always been one of the biggest challenges in the artificial intelligence (AI) field. Building machines that can convincingly mimic human conversations has been even more difficult.
After decades of sluggish progress, recent innovations in machine learning have helped push the process forward. Advanced algorithms can now generate (mostly) coherent text in English and other languages, with astonishing – and sometimes concerning – results.
A language is far more than a dictionary and grammar guide. Words can be combined in infinite, subtle and often ambiguous ways. This means that teaching a computer a list of linguistic rules isn’t enough.
ChatGPT Brings AI Writing Software to the Masses
In 2019, OpenAI released GPT-2, an AI text generator, also known as a Large Language Model (LLM), that scrapes vast swaths of words, and entire articles and book excerpts, from the internet. Almost like a growing child, it soaks in text and learns to predict patterns of words and phrases.
The process costs tens of millions of dollars and requires hundreds of parallel processors. Its latest successor GPT-4, was launched in March 2023 and trained on about 13 trillion “tokens of information.” This amounts to roughly 10x the training data size of its predecessor GPT-3.
The AI writing program can create paragraphs of text on a wide range of subjects that can be hard to distinguish from stories written by (ahem) people.
GPT-4 is a much-improved product as a result of this larger dataset and subtle improvements in the machine learning algorithms underpinning the software. For example, GPT-4 scored in the top 10% of test takers on a simulated bar exam, where the previous GPT-3.5 scored in the bottom 10%. This is a tremendous leap in accuracy and real-world usability.
OpenAI describes the difference between the two AI writing versions: “The difference comes out when the complexity of the task reaches a sufficient threshold—GPT-4 is more reliable, creative, and able to handle much more nuanced instructions than GPT-3.5” The new version also scores 40% higher on internal evaluations that measure the number of hallucinations, or instances of false confidence.
Investors Bet Big on Generative AI Writing
OpenAI offers commercial access to the AI writing algorithm to researchers and startups through an application programming interface (API).
In 2023, on the heels of tremendous media buzz around OpenAI’s ChatGPT, Microsoft made another big bet on OpenAI by investing $10 Billion. OpenAI models are deployed in Microsoft’s Azure public cloud service and power category-defining AI products like GitHub Copilot, DALL·E 2 and ChatGPT, according to a Microsoft blog post.
Most current AI writing models are focused on English text, including Gopher from DeepMind; GPT-NeoX-20b from EleutherAI, a grassroots collective of AI researchers; and Jurassic-1, from the Israeli AI21 Labs.
Huawei also announced a GPT-like Chinese language model called PanGu-alpha, and the South Korean search company Naver introduced a Korean language model called HyperCLOVA.
“Any of these models are capable of producing text that seems perfectly realistic, though they will generally be more believable at certain tasks than others,” says Micah Musser of Georgetown University’s Center for Security and Emerging Technology (CSET).
Companies have also begun rolling out multilingual LLMs, like BLOOM by Hugging Face. BLOOM uses 176 billion parameters to understand queries and generate responses in 46 natural languages. However, the current version cannot yet compete with GPT-4 and the model’s much larger training dataset. It also lacks support for the vast majority of the world’s languages, representing over a billion native speakers.
Generative AI Leaves Much of the World Behind
Despite the development of LLMs using other languages, many of the world’s internet users are locked out of the AI revolution. Brookings recently wrote: “As it stands now, the majority of the world’s speakers are being left behind if they are not part of one of the world’s dominant languages, such as English, French, German, Spanish, Chinese, or Russian.”
Microsoft’s original research mirrors this conclusion. They found that 88% of world languages (representing over 1.2 billion people and 20% of the global population) are not covered by existing LLMs.
With over 7,000 languages being spoken across the world, it’s unclear when the AI writing industry at-large will start building LLMs that appeal to foreign language speakers, or if it will remain the purview of niche companies like Hugging Face. Users across the world are clamoring to join the AI revolution and become potential customers – but the most prominent companies have so far failed to create AI tools in their native languages.
This problem isn’t limited to foreign languages either. There are stark differences even within the English-speaking world, which contains more than 150 dialects around the world.
“Large language models (LLMs) that train AI tools, like generative AI, rely on binary internet data that serve to increase the gap between standard and non-standard speakers, widening the digital language divide,” said researchers at Brookings.
Despite these shortcomings and limited language support, businesses are still embracing generative AI as a tool to increase productivity and generate commercial content.
How Companies Use AI Writing Tools
AI writing software is already used by companies like Facebook, Google and Microsoft in a variety of ways, including language translation, intelligent email assistants, improving search results, and generating marketing copy and computer code.
Readily available resources from cloud computing make it easier for businesses big and small from across different industries to put AI to use. A Deloitte study found that 70% of companies get their AI capabilities through cloud-based software while 65% create AI applications using cloud services.
In fact, many new products and services used by IT departments offer AI-powered automation to make managing data systems easier. And more AI-powered and AI-facilitating products and services hit the market every year.
Evidence all around points to businesses ramping up their use experimentation with and implementation of AI into their operations, with many companies going all-in on AI for content, visual, and other marketing creations.
This newfound reliance on AI writing tools comes with huge upsides—and at least one significant risk.
How AI’s False Confidence Places Businesses at Risk
Despite the vast sums of money being invested into AI writing products and the historic excitement surrounding this new technology, these tools present several risks for companies that rely on the unproven technology.
OpenAI admits the limitations of GPT-4. The company recently noted that “It can sometimes make simple reasoning errors which do not seem to comport with competence across so many domains.” In addition, “GPT-4 can also be confidently wrong in its predictions, not taking care to double-check work when it’s likely to make a mistake.”
This type of AI misinformation, caused by false confidence, means that using LLMs like GPT-4 for commercial purposes is still a major risk (albeit one that shrinks with each iteration).
Companies rely on brand loyalty and perceived trustworthiness to remain competitive, especially in the business-to-business (B2B) world. Failure to properly vet claims and edit AI misinformation can cause embarrassing oversights and damage to the company brand.
Jeff Mains is the founder of Champion Leadership Group, a consulting firm that helps SaaS and other professional services firms accelerate growth. He argues that using current AI models for commercial purposes is a major risk, particularly for companies that lean on AI for decision-making.
“Imagine basing a strategic decision on data that AI pulled together with flawed logic,” Mains said in an exclusive interview. “I think the real risk here is trusting AI too much—without human oversight, you risk embedding errors deep into your operations.”
We also spoke to Edward Tian, CEO of the popular AI writing detector GPTZero. He agrees and warns business leaders that AI is not perfect and is capable of confidently generating incorrect information and passing it off as truth.
Tian says false confidence is a major risk associated with AI—and cautions businesses to avoid using 100% AI-generated text without human oversight. Companies that inadvertently publish inaccurate AI-generated content can face backlash and a loss in trust that’s incredibly difficult to regain.
Generative AI “is absolutely capable of generating inaccurate information or using unreliable sources,” Tian says. “If businesses use these inaccurate or poor outputs, including them in outward-facing media like blog posts or social media posts, that could reflect negatively on the authority of the company.”
Next Generation AI Writing Tools Are Eliminating False Confidence
Despite the very real false confidence risks associated with Generative AI, some experts believe that the latest LLM models have solved this problem for good.
Baidu CEO Robin Li says that the false confidence problem has been solved. “The most significant change we’re seeing over the past 18 to 20 months is the accuracy of those answers from the large language models,” he told the Harvard Business Review Future of Business Conference in October 2024.
“I think over the past 18 months, that problem has pretty much been solved—meaning when you talk to a chatbot, a frontier model-based chatbot, you can basically trust the answer.”
Only time will tell if the next generation of LLMs have permanently eliminated the risk of false confidence. Until then, companies should carefully edit and fact-check any content produced by AI to maintain credibility and protect a brand’s hard-won reputation.
“Mitigating these risks means setting up guardrails. Every AI-generated output needs human review, fact-checking, and contextual judgment,” says Mains. “The technology is impressive, but it’s not infallible—and that’s where human expertise still plays a critical role.”
Editor’s note: Learn how Nutanix software helps align the best AI solutions and use-cases to meet business needs now and going forward.
This is an updated version of the article originally published February 1, 2023.
Julian Smith is a contributing writer. He is the executive editor Atellan Media and author of Aloha Rodeo and Smokejumper published by HarperCollins. He writes about green tech, sustainability, adventure, culture and history.
Marcus Taylor contributed to this story.
© 2024 Nutanix, Inc. All rights reserved. For additional legal information, please go here.