Top-k Sampling
A text generation method where the model only considers the k most likely next tokens at each step, ignoring all others. This limits the pool of candidates to the most probable options.
Why It Matters
Top-k sampling prevents the model from selecting wildly improbable tokens while still allowing creative variation within the top candidates.
Example
With top-k = 50, the model only considers the 50 most likely next tokens at each generation step, regardless of their probabilities.
Think of it like...
Like a hiring manager who only interviews the top 50 applicants — it is a simple cutoff that ensures quality while still allowing choice.
Related Terms
Temperature
A parameter that controls the randomness or creativity of an LLM's output. Lower temperatures (closer to 0) make outputs more deterministic and focused; higher temperatures increase randomness and creativity.
Greedy Decoding
A simple text generation strategy where the model always selects the most probable next token at each step. It is fast but can produce repetitive or suboptimal outputs.