Observability
The ability to understand the internal state and behavior of an AI system through its external outputs, including logging, tracing, and monitoring of LLM calls and agent actions.
Why It Matters
Observability is essential for debugging, optimizing, and maintaining AI systems in production. You cannot fix what you cannot see.
Example
Logging every LLM call with its prompt, response, latency, token count, and cost — then tracing the full chain when an agent takes an unexpected action.
Think of it like...
Like a doctor monitoring vital signs — blood pressure, heart rate, and temperature give insight into what is happening inside without surgery.
Related Terms
MLOps
Machine Learning Operations — the set of practices that combine ML, DevOps, and data engineering to deploy and maintain ML models in production reliably and efficiently.
Model Monitoring
The practice of continuously tracking an ML model's performance, predictions, and input data in production to detect degradation, drift, or anomalies after deployment.
Model Serving
The infrastructure and process of deploying trained ML models to production where they can receive requests and return predictions in real time. It includes scaling, load balancing, and version management.