Artificial Intelligence

Throughput

The number of requests or predictions a model can process in a given time period. High throughput means the system can serve many users simultaneously.

Why It Matters

Throughput determines how many users your AI application can support and directly impacts infrastructure costs and scalability.

Example

A model serving system processing 1,000 requests per second, or an LLM generating 100 tokens per second per user across 50 concurrent sessions.

Think of it like...

Like a highway's capacity — it is not about how fast one car goes (latency) but how many cars can pass through per hour (throughput).

Related Terms