Tracking Azure OpenAI Token Usage with OpenMeter and APIM

As businesses increasingly integrate AI capabilities into their products, accurately tracking usage becomes critical for cost management and billing. When using Azure OpenAI with a single API key for multiple users, you need a robust system to attribute token consumption (prompt_tokens, completion_tokens, and total_tokens) to individual users. In this blog post, we’ll explore how to combine OpenMeter, an open-source usage metering solution, with Azure API Management’s (APIM) emit-token-metric-policy to create a seamless billing system that tracks token usage by internal user IDs without requiring custom modifications.

Why Accurate Token Tracking Matters

Azure OpenAI charges based on token usage, which includes prompt_tokens (input tokens), completion_tokens (output tokens), and total_tokens (sum of both). When a single API key is shared across multiple users, attributing usage to specific users is challenging. A billing system that tracks these metrics per user is essential for:

Cost allocation: Ensure each user or department is billed accurately.
Usage analytics: Gain insights into usage patterns to optimize pricing or resource allocation.
Scalability: Handle high volumes of API calls without performance bottlenecks.

By combining OpenMeter and APIM, we can achieve precise, scalable, and real-time token usage tracking.

Solution Overview

The proposed solution leverages two powerful tools:

OpenMeter: An open-source platform designed for metering AI usage, including OpenAI’s token-based metrics. It supports prompt_tokens, completion_tokens, and total_tokens out of the box, making it ideal for our needs.
APIM’s emit-token-metric-policy: A policy in Azure API Management that extracts token usage from OpenAI API responses and sends it to Application Insights with custom dimensions, such as user IDs.

By integrating these tools, we can:

Extract token usage data via APIM and send it to Application Insights for visualization.
Ingest the same data into OpenMeter for aggregated billing and analytics, using internal user IDs as the key identifier.

Step-by-Step Implementation

1. Configure APIM with emit-token-metric-policy

Azure API Management acts as the gateway for Azure OpenAI requests. The emit-token-metric-policy extracts token usage (prompt_tokens, completion_tokens, total_tokens) from API responses and sends it to Application Insights with a custom dimension for the internal user ID.

In APIM, configure an inbound policy to extract the internal user ID from a header (e.g., x-user-id) and send token metrics to Application Insights under a specific namespace (e.g., openai). Additional dimensions like request status and API identifier can be included for deeper analysis. This setup ensures that every API response is parsed, and token metrics are sent to Application Insights for visualization and querying.

2. Ingest Token Usage into OpenMeter

OpenMeter is designed to handle AI usage metering, supporting prompt_tokens, completion_tokens, and total_tokens natively. It uses a scalable stream-processing architecture to aggregate usage data by user.

Using OpenMeter’s SDK, send token usage data with the internal user ID as the subject. Include all three token metrics and the model used (e.g., gpt-3.5-turbo) in the event data. OpenMeter’s CloudEvents format ensures idempotency, preventing duplicate counting.

3. Aggregate and Query Usage

With OpenMeter, retrieve aggregated usage data for billing or analytics by querying for a specific user ID over a time window (e.g., hourly). This provides token usage summaries suitable for billing or reporting.

In Application Insights, use Azure Monitor to visualize token usage. Select the openai namespace and view metrics like Prompt Tokens, Completion Tokens, and Total Tokens. Filter by the User dimension to analyze usage for specific users.

4. Visualize and Monitor

Application Insights enables building dashboards in Azure Monitor or Power BI to visualize token usage trends per user, API, or status code. OpenMeter’s API supports generating usage reports or integrating with billing platforms like Stripe for monetization.

Benefits of This Approach

No custom code required: Both OpenMeter and emit-token-metric-policy support OpenAI’s token metrics out of the box.
Scalability: OpenMeter’s stream-processing architecture handles high volumes of API calls, while APIM ensures reliable request routing.
Granular insights: Track usage by internal user ID for precise cost allocation and analytics.
Flexibility: Combine Application Insights for real-time visualization with OpenMeter for aggregated billing data.

Considerations

Setup complexity: Configuring APIM and OpenMeter requires initial setup, including deploying OpenMeter (self-hosted or cloud) and enabling Application Insights.
Cost: Using Application Insights and hosting OpenMeter may incur additional costs, depending on your scale.
User ID management: Ensure your application consistently sends the internal user ID (e.g., via x-user-id header) to both APIM and OpenMeter.

Conclusion

By combining OpenMeter and Azure APIM’s emit-token-metric-policy, you can build a robust, scalable billing system for tracking Azure OpenAI token usage per user. OpenMeter handles aggregation and billing, while Application Insights provides powerful visualization and querying capabilities. This approach ensures accurate attribution of prompt_tokens, completion_tokens, and total_tokens to internal user IDs without requiring custom development, making it ideal for businesses looking to manage AI costs effectively.

For more details, check out:

We invite anyone, who want us help start tracking Azure OpenAI usage and make control of limits!

Opensource token accounting system for customers

Tracking Azure OpenAI Token Usage with OpenMeter and APIM

Why Accurate Token Tracking Matters

Solution Overview

Step-by-Step Implementation

1. Configure APIM with emit-token-metric-policy

2. Ingest Token Usage into OpenMeter

3. Aggregate and Query Usage

4. Visualize and Monitor

Benefits of This Approach

Considerations

Conclusion

🍪 We use cookies

Cookie Preferences

Essential Cookies

Analytics Cookies

Marketing Cookies