# What is a Language Model (LM)?

A

language model (LM)is a probabilistic model that assigns probabilities to sequences of tokens (words, characters, or subwords). The key function of a language model is to provide aprobability distributionover a sequence of tokens, which allows the model to predict how likely a sequence of words is to appear together.

Here’s how it works in more detail:

**How Language models generate texts?**

Consider a sequence of tokens {X1,X2,X3….XL} are in vocabulary V, Then the probability of a word coming after :”The monsoon rains have” can be determined by using the following equation , using the chain rule of probability will be :

## Example:

Let’s take the partial sentence “The monsoon rains have” and calculate the probability of the next word, say `X5`

, coming after this phrase using the chain rule. Assume that:

`X1 = "The"`

`X2 = "monsoon"`

`X3 = "rains"`

`X4 = "have"`

Now, the probability of the sequence `"The monsoon rains have X5"`

is given by:

In practical terms, this is how we compute the probability of a word `X5`

coming after the sequence "The monsoon rains have":

**P(X1)**: The probability of the first word being “The”.**P(X2 | X1)**: The probability of the second word being “monsoon” given that the first word is “The”.**P(X3 | X1, X2)**: The probability of the third word being “rains” given the first two words are “The monsoon”.**P(X4 | X1, X2, X3)**: The probability of the fourth word being “have” given the previous words are “The monsoon rains”.**P(X5 | X1, X2, X3, X4)**: The probability of the next word`X5`

given the preceding sequence "The monsoon rains have".

## Illustration with Next Word Prediction:

Now, let’s say we want to predict the next word after “The monsoon rains have”. We would look at a model (like a language model) to compute:

The model could suggest different options for `x5`

like:

- “started”
- “subsided”
- “caused”

Each of these options would have an associated probability, and the word with the highest probability would be selected as the next word.

For example:

Since “started” has the highest probability (0.6), the model would predict “started” as the next word.

**References**

- A Survey of Deep Learning: From Activations to Transformers https://arxiv.org/abs/2302.00722
- An introduction to the course content, logistics, policies and background — IIT Delhi classes on LLM by Tanmoy Chakraborty. https://lcs2-iitd.github.io/ELL881-AIL821-2401/static_files/presentations/1.pdf
- The Evolution of NLP: Past, Present, and Future by Team Pepper https://www.peppercontent.io/blog/tracing-the-evolution-of-nlp/