The best price for scaling your Azure OpenAI applications

Volume Discount Calculator

Potential Savings*: $2500
Discount: 25%

Monthly consumption

$25000

Savings: $2500
Potential
Savings* : 20%
20%
Model
GPT-4o
GPT-4o-mini
GPT-4-Turbo
GPT-3.5-Turbo
Llama 3.1 70b
Llama 3.1 8b
Llama 3.2 1B
Llama 3.2 3B
Llama 3.2 11B Vision
Mixtral 8.7b
Gemma 2.9b
Input price/1M tokens
Input price*
$4.75
/1M tokens
$4.75
/1M tokens
$9.50
/1M tokens
$0.47
/1M tokens
$4.75
/1M tokens
$4.75
/1M tokens
$4.75
/1M tokens
$4.75
/1M tokens
$4.75
/1M tokens
$4.75
/1M tokens
$4.75
/1M tokens
Output price/1M tokens
Output price*
$14.25
/1M tokens
$14.25
/1M tokens
$28.50
/1M tokens
$1.43
/1M tokens
$14.25
/1M tokens
$14.25
/1M tokens
$14.25
/1M tokens
$14.25
/1M tokens
$14.25
/1M tokens
$14.25
/1M tokens
$14.25
/1M tokens
*Potential savings include additional average savings of 20% from CogCache serving cached responses.
Actual savings may vary by use case.
**Effective price per 1M token
Show more
Model
GPT-4o
From
$0
$2,500
$5,000
$12,500
$25,000
To
$2,500
$5,000
$12,500
$25,000
Unlimited
Discount
25%
27.5%
30%
32.5%
40%
GPT-4o-mini
$0
$2,500
$5,000
$12,500
$25,000
$2,500
$5,000
$12,500
$25,000
Unlimited
25%
27.5%
30%
32.5%
40%
GPT-4-Turbo
$0
$2,500
$5,000
$12,500
$25,000
$2,500
$5,000
$12,500
$25,000
Unlimited
25%
27.5%
30%
32.5%
40%
GPT-3.5-Turbo
$0
$2,500
$5,000
$12,500
$25,000
$2,500
$5,000
$12,500
$25,000
Unlimited
25%
27.5%
30%
32.5%
40%
Llama 3.1 70b
$0
$2,500
$5,000
$12,500
$25,000
$2,500
$5,000
$12,500
$25,000
Unlimited
25%
27.5%
30%
32.5%
40%
Llama 3.1 8b
$0
$2,500
$5,000
$12,500
$25,000
$2,500
$5,000
$12,500
$25,000
Unlimited
25%
27.5%
30%
32.5%
40%
Llama 3.2 1B
$0
$2,500
$5,000
$12,500
$25,000
$2,500
$5,000
$12,500
$25,000
Unlimited
25%
27.5%
30%
32.5%
40%
Llama 3.2 3B
$0
$2,500
$5,000
$12,500
$25,000
$2,500
$5,000
$12,500
$25,000
Unlimited
25%
27.5%
30%
32.5%
40%
Llama 3.2 11B Vision
$0
$2,500
$5,000
$12,500
$25,000
$2,500
$5,000
$12,500
$25,000
Unlimited
25%
27.5%
30%
32.5%
40%
Llama 3 70B
$0
$2,500
$5,000
$12,500
$25,000
$2,500
$5,000
$12,500
$25,000
Unlimited
25%
27.5%
30%
32.5%
40%
Llama 3 8B
$0
$2,500
$5,000
$12,500
$25,000
$2,500
$5,000
$12,500
$25,000
Unlimited
25%
27.5%
30%
32.5%
40%
Mixtral 8.7b
$0
$2,500
$5,000
$12,500
$25,000
$2,500
$5,000
$12,500
$25,000
Unlimited
25%
27.5%
30%
32.5%
40%
Gemma 2.9b
$0
$2,500
$5,000
$12,500
$25,000
$2,500
$5,000
$12,500
$25,000
Unlimited
25%
27.5%
30%
32.5%
40%

FAQ

What is a token?
+

You can think of tokens as pieces of words used for natural language processing. For English text, 1 token is approximately 4 characters or 0.75 words.

What are the different models offered?
+

We're currently offering GPT-4o, GPT-4o-mini, GPT-4-Turbo, GPT-3.5-Turbo, Llama 3.1 70b, Llama 3.1 8b, Llama 3.2 1B, Llama 3.2 3B, Llama 3.2 11B Vision, Mixtral 8.7b, Gemma 7b, and Gemma 2.9b.

How is pricing structured?
+

Pricing is tiered based on monthly spend, with discounts increasing as spend increases.

What are the base token prices for each model?
+

Base prices start with a built-in 25% discount to market prices, and increase as spend increases.

What discounts are available?
+

Discounts range from 25% to 40%, depending on the monthly spend and the specific model used.

What does "Potential savings" mean?
+

Potential savings combine the listed price discount with an additional average 20% savings from CogCache serving cached responses. Actual savings derived from cognitive caching can be lower or higher, depending on the use case.

Is there a maximum discount?
+

The maximum listed price discount is 40% at the highest spend tier.

Are there any spending limits?
+

If your monthly spend is over $50,000, contact our sales team.

Is a credit card needed to use CogCache?
+

No, a credit card is not required to use CogCache. You can start using CogCache immediately with a $20 credit.

What happens when your credits run out?
+

Credits autofill based on the limits you set.

What are private endpoints?
+

Private Endpoints create a secure, dedicated network interface that connects your AI application directly to CogCache through Azure's private backbone. This allows you to access CogCache's capabilities through you private virtual networks without exposing traffic to the public internet, ensuring enhanced security and compliance for sensitive AI workloads.

Query Volume Tracking

We monitor the total number of queries per day to our AI models.

Cache Yield

Our system identifies and caches repetitive queries, addressing them from the cache instead of the LLM.

LLM Query Management

This reduction in direct LLM calls minimizes the computational load, leading to lower operational costs.

Aggregate Savings

Over time, the cumulative savings from reduced LLM queries add up, driving down overall costs and improving efficiency.

Get started with $20 of free tokens
const priceRange = document.getElementById('priceRange'); const priceValue = document.querySelector('.pricing-range-value'); const inputVals = document.querySelectorAll('.input-value'); const outputVals = document.querySelectorAll('.output-value'); const discountVals = document.querySelectorAll('.flag-discounts'); const potVals = document.querySelectorAll('.potential-value'); const models = [ { name: 'GPT-4o', inputBase: 5, outputBase: 15, discounts: [5, 7.5, 10, 12.5, 15], potSaving: 20 }, { name: 'GPT-4-Turbo', inputBase: 10, outputBase: 30, discounts: [5, 7.5, 10, 12.5, 15], potSaving: 20 }, { name: 'GPT-3.5-Turbo', inputBase: 0.5, outputBase: 1.5, discounts: [5, 7.5, 10, 15, 20], potSaving: 20 } ]; function updateValues() { const value = parseInt(priceRange.value); priceValue.textContent = `$${value}`; models.forEach((model, index) => { let discount; let savings; if (value < 5000) discount = model.discounts[0]; else if (value < 10000) discount = model.discounts[1]; else if (value < 25000) discount = model.discounts[2]; else if (value < 50000) discount = model.discounts[3]; else discount = model.discounts[4]; if (value < 5000) savings = model.potSaving; else if (value < 10000) savings = model.potSaving; else if (value < 25000) savings = model.potSaving; else if (value < 50000) savings = model.potSaving; else savings = model.potSaving; const inputPrice = model.inputBase - (model.inputBase * discount) / 100; const outputPrice = model.outputBase - (model.outputBase * discount) / 100; inputVals[index].textContent = `$${inputPrice.toFixed(2)}`; outputVals[index].textContent = `$${outputPrice.toFixed(2)}`; discountVals[index].textContent = `${discount}%`; potVals[index].textContent = `${savings + discount}%`; }); } priceRange.addEventListener('input', updateValues); updateValues();

A few more details