It’s 2025. Can you be sure that these words were written by a human being and not a machine? That’s a question that would have been far-fetched as recently as three years ago. But we are now in an era of advanced generative artificial intelligence (GenAI) where tools such as ChatGPT and AlphaCode can instantly write lengthy tracts of nearly perfect English, among other languages, or software code, with a few simple prompts from a human being.
The dramatic advances achieved by these new generation of GenAI tools are made possible by large language models, or LLMs. An LLM, as its name suggests, is a modeling of language that comes into existence by training itself on enormous volumes of written text. With a working and flexible model of language, the LLM can reply to prompts with text that accurately fulfils the requirements of the prompt with syntactically correct language.
LLMs matter because they are the source of a striking array of innovative AI applications. The LLM’s ability to understand and generate, i.e., write and “speak” coherent and accurate language, has very broad appeal across many use cases:
LLMs don’t replace people - in fact, humans are very much needed to provide the LLM with their expertise and mission parameters through prompt engineering (the text instructions you enter into programs such as ChatGPT). And LLMs are still prone to error, so humans also need to review (and revise, when necessary) any output that machines provide.
However, we are only in the very early stages of GenAI and LLM deployments. We have had only a glimpse of the potential this remarkable technology has for transforming the world.
One thing to appreciate about LLMs is that they are not actually new. It may seem as if they burst onto the scene out of thin air but LLMs are simply the latest iteration of natural language processing (NLP) and natural language understanding (NLU) technologies that have been around, in some form or another, for decades. The reason they seem new is that the new generation of tools are so much more powerful than what came before. The basics, though, are the same.
It's also useful to understand, from the perspectives of functionality and infrastructure, that LLMs are “foundational models.” They are not purpose-built for a given use case. For example, the same LLM that helps a doctor write a chart entry about an inflamed appendix could also enable a travel writer to describe Tuscany in the winter—at the same time!
How does this work? To say it’s complicated would be an understatement, and more than a few experts in the field will admit that much of an LLM’s capabilities remain a “black box.” In simple terms, however, an LLM comes to life by training on an immense dataset, perhaps comprising hundreds of billions of documents like web pages and books. The LLM first “tokenizes” each word, rendering them into numerical units for the training process.
Then, using deep learning by way of “transformers,” a neural network architecture that analyzes the relationships between words in a sequence, the LLM gains the ability to parse, and then generate language. At its core, the deep learning process for LLMs is about “attention,” or determining the linguistically relevant emphasis of different words.
For example, if you prompt an LLM by saying, “I want chicken,” if it’s been trained correctly, it will understand this to mean that you want to eat chicken. The word “want” gets the transformer’s attention. Alternatively, if the prompt is “I want a chicken,” the transformer should pay attention to the “a” and understand that you want to possess a chicken. In AI terms, the software is using inference to predict the next token in the sequence.
This can work because of years spent building earlier language models and lexicons that established essential meanings and verbal contexts. The LLM simply just takes it all to a new level of ability.
As you might imagine, training an LLM is a data- and compute-intensive task. For a sense of scale, consider that some LLMs train on The Common Crawl dataset, which is an archive containing over 250 billion web pages. The Common Crawl adds another three-to-five billion pages, or roughly 350 terabytes of data, each month. In total, the dataset is multiple petabytes.
The process of ingesting and tokenizing that much data is itself a mammoth workload. From there, the tokenization and machine learning processes that follow, which can take many months to complete, consume vast amounts of compute, memory, and storage capacity.
While the computing power required to train an LLM is significant, many factors affect its efficiency. For example, the selection and configuration of the LLM’s machine learning algorithms can have an impact on how much time and how many computing cycles will be needed to complete the training. Similarly, preparation of the dataset to remove duplicate or low-quality data can affect the length and intensity of the training process. Choice of hardware and choices about parallelism, to name two of many issues, can also change the computing requirements for LLM training.
Training is not the end of an LLM’s demands for compute, memory, and storage. Running an LLM means keeping a multiple petabyte array up and running, connecting to multiple servers, each with large memories. The more applications and users, the more infrastructure the LLM will need.
A cloud computing platform may be the best fit for an LLM’s enormous scale and processing demands. A public cloud platform like Amazon Web Services (AWS) offers the kind of endless scalability, storage, clustering flexibility, and computational power that LLMs need.
LLM infrastructure managers have choices when it comes to hardware and architecture. Like many AI workloads, LLMs work well on servers equipped with graphical processing units (GPUs), which are designed to handle accelerated computing tasks. In some cases, it may be better to deploy an LLM on tensor processing units (TPUs), or their equivalent. TPUs are designed specifically for AI workloads, so they may be faster and more efficient than GPUs for LLM training.
A private cloud is another choice for LLMs. While this approach has some of the drawbacks of an on-premises LLM instance, e.g., needing to procure and stand up the full complement of equipment, the cloud architecture enables flexibility as the workload evolves over time.
The cloud offers a number of benefits for organizations that want to create LLMs. These relate to cost, accessibility, and flexibility.
The cloud lowers the barrier to entry for using an LLM. An on-premises deployment translates into a significant capital expense (CapEx). There will then be a relatively long period of time required to stand all that equipment up, a process that requires specialized expertise. The cloud avoids both, enabling a pay-as-you-go approach with no CapEx or data center operations expenses required. The cloud also enables resource efficiency for LLMs. It is easy to add compute or storage as needed while training, and then trim down the LLM’s infrastructure footprint as it transitions to runtime.
Working with one of the major cloud platforms also makes possible multi-region support. Performance and latency needs, e.g., a guaranteed service level of one second response time from the LLM, might require it to be deployed in more than one place. This multi-region deployment may also be necessary to address the needs of multiple business units, each of which could require its own LLM instance. However, with the cloud, it is also possible for stakeholders to collaborate on the LLM in real time, regardless of location.
Deploying an LLM on a public cloud platform provides a further benefit of access to the latest advances without the need for a cumbersome or time-consuming upgrade. For example, if you determine that your LLM would be better served by TPUs versus GPUs, the cloud lets you make that change quickly and with relatively little effort. In contrast, acquiring and deploying TPU servers on-premises would be a major undertaking. You can also experiment with different stacks and clustering configurations, among many possible variables, in the cloud.
With those benefits in mind, it’s worth pointing out that implementing LLMs in the cloud comes with its share of challenges. It’s wise to think through cost and resource issues, along with data security and performance factors.
Training and running an LLM in the cloud gets you out of CapEx but the costs of running a sizeable cloud instance can add up over time. It’s a good practice to scope out your computing, memory, and storage requirements carefully in advance. Choice of technology stack can make a big difference in this context. Some platforms enable greater efficiency than others. On a related front, it’s smart to map out how you will balance performance needs with your budget constraints. You may not need the same service levels in all use cases.
LLMs have the potential to cause cyber risk exposure, along with compliance problems. This can occur if the model uses sensitive or private data in its training process. For example, if the LLM ingests medical records and trains on them, then private patient data could end up coming out in generated text. For this reason, it’s a best practice to check the LLM training dataset and remove data that shouldn’t be there. An alternative is to mask data, such as with anonymization, which shields private information from the LLM.
The new generation of LLMs is exciting, but they simply represent the next step in a process that started years ago and will continue to evolve over time. Capabilities will improve. Training will become more efficient. At the same time, issues such as energy consumption are also becoming a factor in considering the use of LLMs.
One new approach to LLM training that’s gaining traction is federated learning. In this machine learning process, more than one entity can train a model without sharing data. This helps with privacy and security. However, each entity can gain language capabilities from the others.
Hybrid cloud solutions are another emerging solution that solves some of the performance and control issues that come with cloud-based LLMs. There’s more flexibility as the LLM spans on-premises infrastructure and the cloud. Users can place parts of the LLM that need high performance on premises while enjoying the economic benefits of the cloud for everything else.
LLMs can be so big and compute-intensive that their levels of energy consumption can be concerning to organizations that stress sustainability. Sustainable IT solutions are now available for AI workloads. By modernizing an LLM’s supporting infrastructure and configuring the system for efficiency, it is possible to reduce an LLM’s energy consumption and environmental impact.
LLMs have advanced to the point where they can understand, express, and summarize human language with remarkable accuracy. Use cases run the gamut from customer service to the practice of medicine, among many others. LLMs work by training themselves on enormous text datasets, using transformers and neural networks to become expert in predicting language and inferring meanings based on patterns of words. The workload is well suited to the cloud, given the need for scalability and flexibility. Getting an LLM to work efficiently and cost effectively in the cloud, however, involves choosing the most suitable cloud stack and balancing resources with performance requirements.