Artificial Intelligence

Latency

The time delay between sending a request to an AI model and receiving the response. In ML systems, latency includes data preprocessing, model inference, and network transmission time.

Why It Matters

Latency determines user experience — a chatbot with 10-second response times feels broken, while one with 200ms feels instant. It is a critical production metric.

Example

An LLM API call taking 800ms to return the first token (time-to-first-token latency) and 3 seconds to generate the complete response.

Think of it like...

Like the wait time at a restaurant — from when you place your order to when food arrives. Some dishes (complex queries) naturally take longer than others.

Related Terms