LLM Observability
LLM Observability is a platform for unified monitoring of performance, cost, and reliability in LLM (Large Language Model)-based applications. It collects request volume, response performance, token usage, cost, and error status of LLM APIs in real time, and provides detailed analysis by model, agent, and provider. Combined with WhaTap APM, server, and Kubernetes infrastructure, it enables end-to-end tracing of LLM calls from application transactions to GPU infrastructure.
Why LLM Monitoring?
Detecting LLM anomalies hidden behind HTTP 200 responses
LLM inference engines return HTTP 200 even when the model generates hallucinations or abnormal responses. Traditional server monitoring cannot detect this issue, causing delayed incident awareness and missed response windows. LLM Observability tracks anomalies in response time, token patterns, and error rates in real time, detecting model issues invisible through HTTP status codes alone.
Without cost visibility, you cannot control spending
LLM APIs are billed per token on every call. Per-call cost varies significantly depending on the model, prompt length, and response size, and unexpected costs can arise as traffic grows. Real-time visibility into which model, which request, and how much cost is incurred is essential for cost control. Since token costs are charged even for failed requests, error costs must be tracked separately to quantify wasted spend.
Users feel slow responses first
LLM responses can be seconds slower than traditional APIs. In streaming environments, if the first token arrives late or token generation speed drops, users perceive "the response has stopped." When most users are fine but some experience slowness, averages cannot detect it. Tracking which model, at what time, and in what pattern slows down -- with time series analysis and cross-model comparison -- is essential for meaningful improvement.
Preserving call context for prompt reproduction
LLM calls generate different responses even with the same prompt. Without preserving "which prompt, which model, which parameters were used," reproduction is impossible when issues occur. LLM Observability collects and preserves system messages, input prompts, model responses, and tool calls in their original form for every LLM call. When an issue occurs, the exact call context at that point can be restored, enabling immediate verification and reproduction.
Multi-model environments require comparative analysis
Using multiple LLM models and providers simultaneously in a single application is common. Data-driven evidence is needed for decisions such as selecting the best model for a workload by comparing performance, cost, and error rates, or replacing models with low cost-efficiency.
Scattered monitoring data prevents root cause analysis
Operating AI applications fragments logs across log platforms, metrics across infrastructure monitoring, costs across provider consoles, and traces across APMs. When issues occur, manually correlating data across multiple tools slows root cause identification. LLM Observability integrates performance, cost, errors, prompt logs, transaction traces, and GPU infrastructure in a single platform, enabling drill-down to root causes without context switching.
LLM Observability Key Features
Real-time LLM Dashboard
Monitor the real-time status of LLM APIs on a single screen. Provides widgets for key metrics including request volume, response performance, TTFT, TPOT, tokens, cost, and errors.
Token Trend Analysis and Model Comparison
Analyze token usage patterns and LLM response performance as time series. Compare performance distribution across models, agents, and providers to identify the most stable model.
Cost Analysis and Optimization
Provides token-level cost tracking, change rates compared to previous periods, and cross-model cost comparison. Supports cost optimization decisions with performance-vs-cost bubble charts.
Cache Efficiency Monitoring
Track prompt caching hit rates and cost savings by time period. Quantitatively verify whether caching strategies contribute to actual cost reduction.
Prompt Log Analysis
Collects and preserves system messages, input prompts, model responses, and tool calls for each individual LLM call. Drill down from dashboards to identify root causes after detecting anomalies.
LLM Transaction Tracing and GPU Correlation Analysis
Traces LLM calls as part of application transactions. In the transaction profile, view prompt, token, and cost details of LLM steps, along with GPU correlation analysis.
Next Steps
- Getting Started — Create a project and select an agent type
- Supported Environment — Check Python/Java supported environments
- LLM Dashboard — Start real-time monitoring