Tiny Language Model

John Mayo-Smith
4 min readApr 21, 2023

--

What if there was a tiny language model that helped explain large language models?

Try Me link to TinyLanguageModel.com

Tiny Language Model (TLM) is a functional language model based on a small neural network that runs in your browser. It has the capability to learn and generate responses based on a six word customizable vocabulary. While very limited, it can offer insights into vastly more complex language models like ChatGPT.

TLM’s trains on a “corpus” compiled from the six-word vocabulary. This training text is like an ultra scaled down version of the millions of documents that ChatGPT trains on. TLM learns and generates responses based on the patterns it detects in the corpus.

As the “Tiny Language Model” trains, its weights change color, providing a visual representation of how the model is learning and generating responses. This shows the inner workings of the model in real-time. By observing the changes in weight color, we can identify potential areas for improvement in the model’s training and performance.

Bottom Line

“A word is characterized by the company it keeps.” — John Firth, 1957

TLM makes connections between words (also called “tokens”). It does this with weights (also called “parameters”). TLM learns that some words go together, and some don’t (anyone who’s played “Mad Libs” knows what that’s like).

There is a debate about whether weights represent knowledge and understanding — most people would say the twelve weights in TLM do not. But what happens when twelve weights are scaled up to to billions of weights? Here’s what happens:

Chat GPT has hundreds of billions of weights. To help visualize hundreds of billions of weights click here.

“Billions of weights” is too complicated for humans to understand. But someday, with help from A.I., maybe we will.

Things to try

  1. LLMs group words together (for example “New York,” “Fast Food,” and “Machine Learning”). To see how this works in Tiny Language Model try making some bigrams (two word groupings) or trigrams (three word groupings) by connecting vocabulary words with dots like this: “fast.food” “jumped.over.the”
  2. Refresh your browser to clear the network and try training the model on the same training text. Do the model weights always look the same?
  3. Try adding more training passes by clicking the ‘Train Tiny Language Model’ button. How do the weights change?
  4. Try editing the training text, then retraining the model. For example change “blue cat sleeps” to “blue cat jumps.” How do the weights change? How easy is it for the model to “unlearn things?”
  5. If you’re interested in the math, open the dev tools console in your browser. Here you can see all the parameter values:

Limitations

Tiny Language Model has many limitations. The biggest is that it only has twelve parameters and is technically only able to handle a four word vocabulary. To make the model more interesting, vocabulary was stretched to six words by making an assumption (a hack) about word order (adjective →noun →verb). This assumption is loosely analogous to concepts of word embeddings and transformers.

Sources

“What Is ChatGPT Doing … and Why Does It Work? — Steven Wolfram

CS480/680 Lecture 19: Attention and Transformer Networks — Pascal Poupart

Large Language Models: Scaling Laws and Emergent Properties

GPT Technical Report

Word Embedding

Tiny Language Model on Github

ChatGPT helped code Tiny Language Model and edit this post.

--

--

Responses (1)