AI Fundamentals

Token

Last updated: February 16, 2026

A token is the fundamental unit of text that a large language model reads and generates. Rather than processing text character by character or word by word, LLMs break input into tokens -- subword pieces that balance vocabulary size with the ability to represent any text.

How It Works

Before an LLM processes your input, a tokenizer splits the text into tokens using a learned vocabulary. Common words like "the" or "hello" typically become single tokens, while less common or longer words are split into multiple subword tokens. For example, "deployment" might be two tokens ("deploy" + "ment"), and a rare technical term could be split into several pieces.

Different models use different tokenizers, so the same text may produce a different number of tokens depending on the model. As a rough rule of thumb for English text, one token corresponds to approximately four characters or three-quarters of a word.

Why It Matters

Tokens are the currency of LLM usage. They determine three critical aspects of any AI deployment:

  • Cost: Most model providers charge per token for both input and output, so token count directly affects your bill.
  • Context limits: The context window is measured in tokens, so understanding tokenization helps you estimate how much information fits in a single request.
  • Latency: Generating more output tokens takes more time, since LLMs produce tokens sequentially.

In Practice

When building AI-powered applications, tracking token usage helps optimize both performance and cost. Efficient prompt design minimizes unnecessary input tokens, while setting appropriate maximum output lengths prevents runaway generation. Most deployment platforms and API dashboards provide token counts per request, making it straightforward to monitor consumption and budget accordingly.