AI Tokens Explained: The Currency of Large Language Models
The Building Blocks of AI Language
When you type a message to ChatGPT, Claude, or Gemini, your text does not reach the model as words or characters. It gets broken down into tokens — small pieces that the model can process. Understanding tokens is essential if you want to predict costs, optimize prompts, or understand why models sometimes behave strangely at certain character boundaries.
Tokenization is one of those behind-the-scenes mechanisms that has an outsized impact on how AI works, what it costs, and what it can do.
How Tokenization Works
Modern language models use subword tokenization, most commonly a method called Byte Pair Encoding (BPE). The basic idea is elegant: start with individual characters, then iteratively merge the most frequent pairs of adjacent tokens until you reach a desired vocabulary size.
The result is a vocabulary of roughly 50,000 to 100,000 tokens that includes:
- Common words as single tokens: “the”, “and”, “hello”
- Common word parts: “ing”, “tion”, “un”
- Individual characters for rare combinations
- Special tokens for numbers, punctuation, and whitespace
For example, the word “tokenization” might be split into: ["token", "ization"] — two tokens. The word “the” is a single token. But a rare word like “pneumonoultramicroscopicsilicovolcanoconiosis” would be split into many small tokens.
Why Languages Are Not Equal
Tokenization was originally optimized for English text. This has consequences:
English is the most efficiently tokenized language. One token covers approximately 4 characters or 0.75 words.
German is less efficient because of compound words and longer average word length. One token covers approximately 3 characters or 0.6 words. The word “Donaudampfschifffahrtsgesellschaftskapitän” might take 8-10 tokens, while its English equivalent “Danube steamship company captain” takes about 5.
Chinese, Japanese, Korean can be even less efficient in some tokenizers, with individual characters sometimes requiring multiple tokens.
This means the same content in German costs roughly 30% more in tokens than the English equivalent — a significant factor for multilingual applications.
Tokens and Context Windows
Every AI model has a context window — the maximum number of tokens it can process at once (input + output combined). This is the model’s “working memory.”
| Model | Context Window |
|---|---|
| GPT-4o | 128,000 tokens |
| Claude Sonnet 4 | 200,000 tokens |
| Gemini 2.0 Flash | 1,000,000 tokens |
| Gemini 2.0 Pro | 2,000,000 tokens |
To put this in perspective:
- 128K tokens ≈ 96,000 words ≈ a 384-page book
- 1M tokens ≈ 750,000 words ≈ the entire Lord of the Rings trilogy plus The Hobbit
- 2M tokens ≈ 1.5 million words ≈ roughly the complete Harry Potter series twice
These large context windows enable new use cases like processing entire codebases, analyzing lengthy legal documents, or maintaining extremely long conversations.
The Cost Implications
Since AI APIs charge per token, understanding token counts directly translates to understanding costs. Here are some common reference points:
| Content | Approximate Tokens |
|---|---|
| A tweet (280 characters) | ~70 |
| An email (200 words) | ~267 |
| An A4 page (250 words) | ~333 |
| A blog post (1,000 words) | ~1,333 |
| A short book (50,000 words) | ~66,667 |
At GPT-4o’s pricing ($2.50/M input), processing a full book as input costs about $0.17. At Gemini 2.0 Flash ($0.10/M input), the same book costs $0.007 — less than a penny.
Practical Token Estimation
You do not need an exact tokenizer to estimate costs. For quick calculations:
- Count words in your text
- Multiply by 1.33 for English (or 1.67 for German) to get approximate tokens
- Multiply by price per token to get cost
For example: a 500-word system prompt in English ≈ 665 tokens. If sent with every API call, and you make 1,000 calls per day, that is 665,000 tokens per day just for the system prompt.
Try It Yourself
Want to see how many tokens your text uses? Paste it into our Text to Token Estimator for an instant count. Need to convert between tokens, words, and characters? The Token-Word Converter handles bidirectional conversions. And to visualize what your token count means in real-world terms, try the Token-Page Converter — see your tokens as A4 pages, books, or tweets.
Fun Fact: The tokenizer used by GPT-4 treats the space before a word as part of the token. So ” hello” (with a leading space) is a single token, but “hello” at the start of text is a different token. This is why AI models occasionally produce unexpected spacing — they are operating in token-space, not character-space.