Token
Definition
The basic unit of text that language models process, typically representing a word, subword, or character — also the standard billing unit for commercial AI API usage.
Tokens are the atoms of language model processing. A token might be a whole word ("hello"), a subword piece ("un" + "believe" + "able"), a single character, or a special marker. Most English text averages about 1.3 tokens per word, so 1,000 tokens is roughly 750 words. The tokenizer converts text to tokens (encoding) and tokens back to text (decoding). Tokens serve double duty as the billing unit for AI APIs — OpenAI, Anthropic, and Google all price their services per input and output token. Understanding tokenization is important for cost optimization, managing context window limits, and debugging unexpected model behavior. Different models use different tokenizers, so the same text may require different numbers of tokens across models. Tokenization efficiency for non-English languages varies significantly.
Related Terms
Large Language Model
A neural network with billions of parameters trained on massive text datasets, capable of understand...
Tokenizer
A component that splits text into smaller units called tokens (words, subwords, or characters) that ...
Context Window
The maximum number of tokens (input plus output) that a language model can process in a single inter...
Inference Cost
The computational expense of running a trained AI model to generate predictions or outputs, typicall...