Word Games

3 min readMar 18, 2024

How ChatGPT is sort of like Minecraft but with text instead of texture maps.

At Left: Objects in Minecraft move, rotate and scale depending on where you go. At Right: What comes after the word ‘Four’ depends on ‘where you go.’

It’s no coincidence that the same GPUs which accelerate video games also power large language models. Nvidia is at the forefront of the AI revolution because LLMs understand and generate text in ways that are similar to how games use matrices and vectors to manipulate 3D objects and textures. How objects move, scale and rotate in a three-dimensional game like Minecraft is similar to how words and sentences ‘move’ in a high-dimensional ‘game’ like ChatGPT.

Transforming Objects vs. Transforming Words

In Games: Objects in a game are transformed using matrices to move, rotate, and scale. Each object’s position and appearance in the game world are determined by these transformations.
In LLMs: Words and sentences are represented as vectors (lists of numbers) in a high-dimensional space. Transformations, similar to matrices in 3D graphics, are applied to these vectors to change their meanings, contexts, or grammatical forms. These transformations help understand the relationships between words and how they fit together in sentences.

Setting the Camera View vs. Understanding Context

In Games: The view transformation matrix changes how the 3D world is viewed, adjusting everything based on the camera’s position and orientation.
In LLMs: Context is crucial for understanding and generating language. ChatGPT adjusts the “view” of the text based on the surrounding words and sentences, which helps determine the most appropriate meanings and responses. This adjustment is like setting the camera view, but for understanding language context.

Flattening 3D to 2D vs. Generating Responses

In Games: The projection transformation flattens a 3D scene into a 2D view that can be displayed on a screen, making a complex 3D world understandable in a 2D format.
In LLMs: ChatGPT takes complex ideas and thoughts (which can be thought of as “3D”) and “flatten” them into understandable, linear text (the “2D” output). This involves selecting the right words and structuring sentences in a way that conveys the intended message clearly.

Adding Texture vs. Adding Nuance

In Games: Textures add detail and realism to 3D models, making them appear more lifelike.
In LLMs: Nuance and context add “texture” to language. By understanding the subtle differences in word choice and sentence structure, ChatGPT can generate text that feels more natural and tailored to the specific query or context.

Making It All Work Together

In Games: The rendering pipeline combines transformations, camera settings, and textures to create a cohesive and immersive 3D environment.
In LLMs: ChatGPT’s internal processing combines understanding of context, language rules, and nuances to generate coherent and relevant responses. Just like the rendering process in games, ChatGPT use a series of steps (like encoding input, processing it through layers of the model, and decoding it into text) to generate responses that meet the user’s query.

In both cases, complex processes are at work behind the scenes, whether it’s rendering a lifelike game environment or generating a text response. These processes involve transforming basic components (whether they’re 3D models or words) in sophisticated ways to create an output that’s engaging and meaningful.

(This article was created with help from ChatGPT transformed matrices and vectors. )