Metering Token Usage
Introduction
Envoy AI Gateway exposes Prometheus metrics that follow the OpenTelemetry GenAI semantic conventions, including token usage per request. By adding the caller's identity as a metric label and collecting the metric through the platform monitoring stack, you get a unified view of token consumption per department, namespace, and model. The same data feeds chargeback through Alauda Cost Management.
The pipeline is: the gateway emits token metrics, identity is attached as a label, a PodMonitor collects the metric into the platform, and a MonitorDashboard presents it. No raw PromQL is required for day-to-day viewing.
Use Cases
- Show each department its own token consumption by model, isolated per project.
- Track which models drive the most token usage across the platform.
- Provide the usage data that Alauda Cost Management prices into a chargeback report.
Prerequisites
-
An
AIGatewayRoutewithllmRequestCostsconfigured. See Configuring Token Quotas. WithoutllmRequestCoststhe gateway still emitsgen_ai_client_token_usage_token, but the per-request token counts will all be zero. -
Caller identity propagated as request headers. See Authenticating Consumers.
-
Platform monitoring is enabled on the cluster. Confirm by checking the Prometheus operator CRDs:
-
Sanity-check that the metric is being emitted before wiring monitoring. Send one request through the gateway, then read the ExtProc sidecar's admin port on a data-plane proxy pod:
If no
gen_ai_*sample appears, no scraping below will work — first fix the route / ExtProc wiring.
Create the Gateway and AIGatewayRoute in a dedicated namespace (for example maas-system), not in the Envoy Gateway control-plane namespace envoy-gateway-system. A gateway placed in the control-plane namespace may not have the AI Gateway request-processing filter and SecurityPolicy applied to its listener, which silently breaks routing and policy enforcement. See Envoy AI Gateway.
Steps
Add an identity label to token metrics
By default the token metric gen_ai_client_token_usage_token carries the OpenTelemetry GenAI standard labels only (model, provider, operation, token type). Enrich it with caller-identity dimensions — the billed namespace and the caller's department — by mapping identity headers to metric labels in the Envoy AI Gateway controller.
The controller reads the mapping from the CLI flag --metricsRequestHeaderAttributes=<header>:<label>[,<header>:<label>...] on the ai-gateway-controller Deployment. If the controller was installed via Helm, the chart renders this flag from a values key (for example controller.metricsRequestHeaderAttributes); supply your release name and chart reference and apply helm upgrade --reuse-values. If you manage the Deployment directly, patch its container args.
The flag is a single comma-joined <header>:<label> map, not a repeatable list: passing it twice makes the controller keep only the last copy. The platform also ships default pairs (for example x-user-name:user and x-access-meta:client_id) that the data-plane sidecar already emits — so you must rewrite the one flag with the union of the existing pairs plus your new one. Omitting any pair removes that metric label from every proxy on its next restart.
First read the pairs the data plane actually emits today (the proxy sidecar is the source of truth — the controller flag may already have been narrowed):
Then rewrite the controller's single flag with that full set plus x-user-group:department. Substitute the left side below with whatever the command above printed — do not drop any pair:
x-user-namespace→user_namespace: the namespace or tenant a request is billed to. This is the per-namespace key that Chargeback with Cost Management groups on, so keep it whenever Cost Management is in use.x-user-group→department: a low-cardinality identity dimension for dashboards. Set by theSecurityPolicyfrom the IdPgroupsclaim.
The --metricsRequestHeaderAttributes mapping is baked into the ExtProc sidecar args when a proxy pod is created. Patching the controller and waiting for its rollout is not sufficient — the proxy pods must be recreated (the rollout restart above) before the new label appears. If the verification below returns empty output, the proxy pods were not recreated.
user_namespace (the billed namespace) is the reliable default grouping — it is always present once x-user-namespace is mapped. department (x-user-group) is a useful low-cardinality dimension only when the IdP emits a single-valued group claim (the standard OIDC groups claim is an array and is not supported by claimToHeaders, so department stays empty otherwise — see Authenticating Consumers). Avoid a per-user label (x-user-id): it produces high-cardinality series (one per user × model × token type), so add it only when per-user reporting is required and a retention window keeps the series count bounded.
After the rollout, send a fresh request and confirm the new labels are on the sample. user_namespace is always present; department appears only when a scalar group claim is configured:
Collect the metric into the platform
The metric is emitted by the AI Gateway external processor (ExtProc), which runs as a sidecar on each data-plane proxy pod (declared as a Kubernetes native sidecar / initContainer) and exposes a Prometheus metrics endpoint on container port 1064 (named aigw-metrics). It is scraped directly from the proxy pods with a PodMonitor. For the platform workflow, see metrics management.
On the Alauda platform a PodMonitor for this sidecar is often pre-installed. Check first — a second PodMonitor on the same endpoint creates an overlapping scrape pool that scrapes every proxy pod twice, producing duplicate series under a different job label and doubling every increase()/sum in the dashboards and in the chargeback query (neither filters by job):
Only if no PodMonitor already targets aigw-metrics, discover the label your Prometheus operator uses to select PodMonitor objects (so the resource below is actually picked up) and create one:
Then apply the PodMonitor with that label in its own metadata.labels (not the selector — these are two different things):
metadata.labels: how Prometheus discovers thePodMonitor. Without the right label, the resource exists but is invisible to the scrape pipeline.spec.selector: matches every Envoy Gateway data-plane proxy pod in the cluster. To restrict to oneGateway, replace it withgateway.envoyproxy.io/owning-gateway-name: <gateway-name>andgateway.envoyproxy.io/owning-gateway-namespace: <gateway-namespace>.port: aigw-metrics: the named port on the ExtProc sidecar that serves/metricson container port1064.
Confirm Prometheus is scraping the sidecar — and that exactly one scrape pool covers it (two pools means a duplicate PodMonitor is double-counting). Port-forward Prometheus and group the active targets by pool:
Build a unified usage dashboard
Create a MonitorDashboard to present token usage. Use variables for namespace, model, and department so consumers can filter, and rely on User View so each project sees only its own data. For the platform workflow, see monitoring dashboards.
At the ExtProc /metrics endpoint the metric is exposed in Prometheus text format with the OpenTelemetry name normalized to underscores (gen_ai_client_token_usage_token_sum, as shown in the prerequisite check above). The platform scrape pipeline, however, stores the series under its original OpenTelemetry UTF-8 name with the dots preserved. In Prometheus the series is therefore gen_ai.client.token.usage_token_sum (with the histogram _sum/_count/_bucket variants), and it must be selected with the quoted {__name__="..."} form — a bare underscore identifier matches nothing and returns an empty vector. The intrinsic GenAI labels are likewise dotted (gen_ai.request.model, gen_ai.token.type) and need the quoted UTF-8 label syntax; header-mapped labels such as department and user_namespace stay plain identifiers:
Group by user_namespace for the headline panels: it is always present once x-user-namespace is mapped. The department label appears only when x-user-group is populated, which requires a single-valued group claim from the IdP — the standard OIDC groups claim is an array and is not supported, so sum by (department) collapses to a single empty-labelled series otherwise. Group by department only after confirming the label exists (grep 'department=' returns output); otherwise expose a scalar claim in the IdP connector.
Confirm the metric is queryable by listing all gen_ai* series names on the Prometheus UI's Status → TSDB page, or with:
Verification
Send a few authenticated requests that resolve to different namespaces, then confirm the metric carries the identity labels by port-forwarding the ExtProc sidecar's metrics port on any proxy pod:
Expect at least one sample per billed namespace, for example:
If you configured a scalar group claim, the same samples also carry a department= label (check with grep 'department='). Open the dashboard and confirm token usage appears, filterable by namespace, model, and department.
Learn More
Next Steps
With token usage collected, configure Charging Back Token Usage to price the metric into per-namespace bills with Alauda Cost Management.