📈 [BETA] Prometheus metrics
info
🚨 Prometheus metrics will be out of Beta on September 15, 2024 - as part of this release it will be on LiteLLM Enterprise starting at $250/mo
LiteLLM Exposes a /metrics
endpoint for Prometheus to Poll
Quick Start
If you're using the LiteLLM CLI with litellm --config proxy_config.yaml
then you need to pip install prometheus_client==0.20.0
. This is already pre-installed on the litellm Docker image
Add this to your proxy config.yaml
model_list:
- model_name: gpt-3.5-turbo
litellm_params:
model: gpt-3.5-turbo
litellm_settings:
success_callback: ["prometheus"]
failure_callback: ["prometheus"]
Start the proxy
litellm --config config.yaml --debug
Test Request
curl --location 'http://0.0.0.0:4000/chat/completions' \
--header 'Content-Type: application/json' \
--data '{
"model": "gpt-3.5-turbo",
"messages": [
{
"role": "user",
"content": "what llm are you"
}
]
}'
View Metrics on /metrics
, Visit http://localhost:4000/metrics
http://localhost:4000/metrics
# <proxy_base_url>/metrics
📈 Metrics Tracked
Proxy Requests / Spend Metrics
Metric Name | Description |
---|---|
litellm_requests_metric | Number of requests made, per "user", "key", "model", "team", "end-user" |
litellm_spend_metric | Total Spend, per "user", "key", "model", "team", "end-user" |
litellm_total_tokens | input + output tokens per "user", "key", "model", "team", "end-user" |
litellm_llm_api_failed_requests_metric | Number of failed LLM API requests per "user", "key", "model", "team", "end-user" |
Request Latency Metrics
Metric Name | Description |
---|---|
litellm_request_total_latency_metric | Total latency (seconds) for a request to LiteLLM Proxy Server - tracked for labels litellm_call_id , model |
litellm_llm_api_latency_metric | latency (seconds) for just the LLM API call - tracked for labels litellm_call_id , model |
LLM API / Provider Metrics
Metric Name | Description |
---|---|
litellm_deployment_state | The state of the deployment: 0 = healthy, 1 = partial outage, 2 = complete outage. |
litellm_remaining_requests_metric | Track x-ratelimit-remaining-requests returned from LLM API Deployment |
litellm_remaining_tokens | Track x-ratelimit-remaining-tokens return from LLM API Deployment |
litellm_deployment_success_responses | Total number of successful LLM API calls for deployment |
litellm_deployment_failure_responses | Total number of failed LLM API calls for deployment |
litellm_deployment_total_requests | Total number of LLM API calls for deployment - success + failure |
litellm_deployment_latency_per_output_token | Latency per output token for deployment |
litellm_deployment_successful_fallbacks | Number of successful fallback requests from primary model -> fallback model |
litellm_deployment_failed_fallbacks | Number of failed fallback requests from primary model -> fallback model |
Budget Metrics
Metric Name | Description |
---|---|
litellm_remaining_team_budget_metric | Remaining Budget for Team (A team created on LiteLLM) |
litellm_remaining_api_key_budget_metric | Remaining Budget for API Key (A key Created on LiteLLM) |
Monitor System Health
To monitor the health of litellm adjacent services (redis / postgres), do:
model_list:
- model_name: gpt-3.5-turbo
litellm_params:
model: gpt-3.5-turbo
litellm_settings:
service_callback: ["prometheus_system"]
Metric Name | Description |
---|---|
litellm_redis_latency | histogram latency for redis calls |
litellm_redis_fails | Number of failed redis calls |
litellm_self_latency | Histogram latency for successful litellm api call |
🔥 Community Maintained Grafana Dashboards
Link to Grafana Dashboards made by LiteLLM community
https://github.com/BerriAI/litellm/tree/main/cookbook/litellm_proxy_server/grafana_dashboard