Tuesday, August 01, 2023

The Next Word

 If the artificial intelligence craze feels strangely familiar, it should, because we’ve been dealing versions of it for a long time now under different names. We’ve been introduced to products that supposedly get smarter the more we use them.

Our entire world has evolved into a giant software testing environment and we are the guinea pigs.

Meanwhile, there are unique aspects to what is now called AI, and they’re worth knowing about. One of the best explainers to date is in Ars Technica. Here is my summary in the form of excerpts:

  • To understand how language models work, you first need to understand how they represent words. Humans represent English words with a sequence of letters, like C-A-T for "cat." Language models use a long list of numbers called a "word vector." The full vector for cat is 300 numbers long.

  • Each word vector represents a point in an imaginary “word space,” and words with more similar meanings are placed closer together. For example, the words closest to cat in vector space include dog, kitten, and pet.

  • Word vectors are a useful building block for language models because they encode subtle but important information about the relationships between words. If a language model learns something about a cat (for example, it sometimes goes to the vet), the same thing is likely to be true of a kitten or a dog.

  • (But) words often have multiple meanings. And meaning depends on context. To transform word vectors into word predictions, large language models (LLMs) use layers that act as transformers. Each layer adds information to help clarify the meaning of that word and better predict which word might come next. 

  • Researchers don’t understand exactly how LLMs keep track of this information, but logically speaking, the model must be doing it by modifying the hidden state vectors as they get passed from one layer to the next. It helps that in modern LLMs, these vectors are extremely large. The most powerful version of GPT-3, for example, has 96 layers and uses word vectors with 12,288 dimensions—that is, each word is represented by a list of 12,288 numbers!

  • A key innovation of LLMs is that they don’t need explicitly labeled data. Instead, they learn by trying to predict the next word in ordinary passages of text. Almost any written material—from Wikipedia pages to news articles to computer code—is suitable for training these models.

  • You might find it surprising that the training process works as well as it does. ChatGPT can perform all sorts of complex tasks—composing essays, drawing analogies, and even writing computer code. So how does such a simple learning mechanism produce such a powerful model?

  • One reason is scale. It’s hard to overstate the sheer number of examples that a model like GPT-3 sees. GPT-3 was trained on a corpus of approximately 500 billion words. For comparison, a typical human child encounters roughly 100 million words by age 10.

  • At the moment, we don’t have any real insight into how LLMs accomplish feats like this. Some people argue that such examples demonstrate that the models are starting to truly understand the meanings of the words in their training set. Others insist that language models are “stochastic parrots” that merely repeat increasingly complex word sequences without truly understanding them.

  • Traditionally, a major challenge for building language models was figuring out the most useful way of representing different words—especially because the meanings of many words depend heavily on context. The next-word prediction approach allows researchers to sidestep this thorny theoretical puzzle by turning it into an empirical problem. It turns out that if we provide enough data and computing power, language models end up learning a lot about how human language works simply by figuring out how to best predict the next word. The downside is that we wind up with systems whose inner workings we don’t fully understand.

I recommend that you bookmark the entire Ars Technica article.

LINKS:

No comments: