Test-Time Compute
Allocating additional computation during inference (not training) to improve output quality. Techniques include chain-of-thought, self-consistency, and iterative refinement.
Why It Matters
Test-time compute offers a way to improve AI output quality without retraining. You can make any model better by giving it more time to 'think.'
Example
Running chain-of-thought reasoning plus self-consistency (10 samples, majority vote) at inference time, improving math accuracy from 70% to 90% at the cost of 10x more compute.
Think of it like...
Like giving a student more exam time — the same knowledge, but more time to think, check work, and reconsider, leading to better answers.
Related Terms
Chain-of-Thought
A prompting technique where the model is encouraged to show its step-by-step reasoning process before arriving at a final answer. This improves accuracy on complex reasoning tasks.
Self-Consistency
A decoding strategy where the model generates multiple reasoning paths for the same question and selects the answer that appears most frequently across paths. It improves accuracy on reasoning tasks.
Inference
The process of using a trained model to make predictions on new, previously unseen data. Inference is what happens when an AI model is deployed and actively serving results to users.
Reasoning
An AI model's ability to think logically, make inferences, draw conclusions, and solve problems that require multi-step thought. Reasoning goes beyond pattern matching to genuine logical analysis.
Compute
The computational resources (processing power, memory, time) required to train or run AI models. Compute is measured in FLOPs (floating-point operations) and is a primary constraint and cost in AI development.