As businesses increasingly integrate AI capabilities into their products, accurately tracking usage becomes critical for cost management and billing. When using Azure OpenAI with a single API key for multiple users, you need a robust system to attribute token consumption (prompt_tokens, completion_tokens, and total_tokens) to individual users. In this blog post, we’ll explore how to combine OpenMeter, an open-source usage metering solution, with Azure API Management’s (APIM) emit-token-metric-policy to create a seamless billing system that tracks token usage by internal user IDs without requiring custom modifications.
Azure OpenAI charges based on token usage, which includes prompt_tokens (input tokens), completion_tokens (output tokens), and total_tokens (sum of both). When a single API key is shared across multiple users, attributing usage to specific users is challenging. A billing system that tracks these metrics per user is essential for:
By combining OpenMeter and APIM, we can achieve precise, scalable, and real-time token usage tracking.
The proposed solution leverages two powerful tools:
By integrating these tools, we can:
Azure API Management acts as the gateway for Azure OpenAI requests. The emit-token-metric-policy extracts token usage (prompt_tokens, completion_tokens, total_tokens) from API responses and sends it to Application Insights with a custom dimension for the internal user ID.
In APIM, configure an inbound policy to extract the internal user ID from a header (e.g., x-user-id) and send token metrics to Application Insights under a specific namespace (e.g., openai). Additional dimensions like request status and API identifier can be included for deeper analysis. This setup ensures that every API response is parsed, and token metrics are sent to Application Insights for visualization and querying.
OpenMeter is designed to handle AI usage metering, supporting prompt_tokens, completion_tokens, and total_tokens natively. It uses a scalable stream-processing architecture to aggregate usage data by user.
Using OpenMeter’s SDK, send token usage data with the internal user ID as the subject. Include all three token metrics and the model used (e.g., gpt-3.5-turbo) in the event data. OpenMeter’s CloudEvents format ensures idempotency, preventing duplicate counting.
With OpenMeter, retrieve aggregated usage data for billing or analytics by querying for a specific user ID over a time window (e.g., hourly). This provides token usage summaries suitable for billing or reporting.
In Application Insights, use Azure Monitor to visualize token usage. Select the openai namespace and view metrics like Prompt Tokens, Completion Tokens, and Total Tokens. Filter by the User dimension to analyze usage for specific users.
Application Insights enables building dashboards in Azure Monitor or Power BI to visualize token usage trends per user, API, or status code. OpenMeter’s API supports generating usage reports or integrating with billing platforms like Stripe for monetization.
By combining OpenMeter and Azure APIM’s emit-token-metric-policy, you can build a robust, scalable billing system for tracking Azure OpenAI token usage per user. OpenMeter handles aggregation and billing, while Application Insights provides powerful visualization and querying capabilities. This approach ensures accurate attribution of prompt_tokens, completion_tokens, and total_tokens to internal user IDs without requiring custom development, making it ideal for businesses looking to manage AI costs effectively.
For more details, check out:
We invite anyone, who want us help start tracking Azure OpenAI usage and make control of limits!