Sampling Strategy
The method used to select the next token during text generation. Different strategies (greedy, top-k, top-p, temperature-based) produce different tradeoffs between quality and diversity.
Why It Matters
The sampling strategy determines the personality of your AI output — conservative and precise versus creative and surprising. Matching strategy to use case is critical.
Example
Using greedy decoding (always pick the most likely token) for factual Q&A, but top-p sampling with temperature 0.8 for creative writing.
Think of it like...
Like choosing how adventurous to be at a restaurant — always ordering the most popular dish (greedy) versus trying something new from the specials (sampling).
Related Terms
Temperature
A parameter that controls the randomness or creativity of an LLM's output. Lower temperatures (closer to 0) make outputs more deterministic and focused; higher temperatures increase randomness and creativity.
Greedy Decoding
A simple text generation strategy where the model always selects the most probable next token at each step. It is fast but can produce repetitive or suboptimal outputs.
Beam Search
A search algorithm used in text generation that explores multiple possible output sequences simultaneously, keeping the top-scoring candidates at each step. It finds higher-quality outputs than greedy decoding.