The best price for scaling your Azure OpenAI applications
Volume Discount Calculator
FAQ
You can think of tokens as pieces of words used for natural language processing. For English text, 1 token is approximately 4 characters or 0.75 words.
We're currently offering GPT-4o, GPT-4-Turbo, and GPT-3.5-Turbo.
Pricing is tiered based on monthly spend, with discounts increasing as spend increases.
Base prices start with a built-in 5% discount to market prices, and increase as spend increases.
Discounts range from 5% to 20%, depending on the monthly spend and the specific model used.
Potential savings combine the listed price discount with an additional average 20% savings from CogCache serving cached responses. Actual savings derived from cognitive caching can be lower or higher, depending on the use case.
The maximum listed price discount is 20% for GPT-3.5-Turbo at the highest spend tier.
If your monthly spend is over $50,000, contact our sales team.
No, a credit card is not required to use CogCache. You can start using CogCache immediately with a $20 credit.
Credits autofill based on the limits you set.
Query Volume Tracking
We monitor the total number of queries per day to our AI models.
Cache Yield
Our system identifies and caches repetitive queries, addressing them from the cache instead of the LLM.
LLM Query Management
This reduction in direct LLM calls minimizes the computational load, leading to lower operational costs.
Aggregate Savings
Over time, the cumulative savings from reduced LLM queries add up, driving down overall costs and improving efficiency.
Get started with $20 of free tokens
const priceRange = document.getElementById('priceRange');
const priceValue = document.querySelector('.pricing-range-value');
const inputVals = document.querySelectorAll('.input-value');
const outputVals = document.querySelectorAll('.output-value');
const discountVals = document.querySelectorAll('.flag-discounts');
const potVals = document.querySelectorAll('.potential-value');
const models = [
{ name: 'GPT-4o', inputBase: 5, outputBase: 15, discounts: [5, 7.5, 10, 12.5, 15], potSaving: 20 },
{ name: 'GPT-4-Turbo', inputBase: 10, outputBase: 30, discounts: [5, 7.5, 10, 12.5, 15], potSaving: 20 },
{ name: 'GPT-3.5-Turbo', inputBase: 0.5, outputBase: 1.5, discounts: [5, 7.5, 10, 15, 20], potSaving: 20 }
];
function updateValues() {
const value = parseInt(priceRange.value);
priceValue.textContent = `$${value}`;
models.forEach((model, index) => {
let discount;
let savings;
if (value < 5000) discount = model.discounts[0];
else if (value < 10000) discount = model.discounts[1];
else if (value < 25000) discount = model.discounts[2];
else if (value < 50000) discount = model.discounts[3];
else discount = model.discounts[4];
if (value < 5000) savings = model.potSaving;
else if (value < 10000) savings = model.potSaving;
else if (value < 25000) savings = model.potSaving;
else if (value < 50000) savings = model.potSaving;
else savings = model.potSaving;
const inputPrice = model.inputBase - (model.inputBase * discount) / 100;
const outputPrice = model.outputBase - (model.outputBase * discount) / 100;
inputVals[index].textContent = `$${inputPrice.toFixed(2)}`;
outputVals[index].textContent = `$${outputPrice.toFixed(2)}`;
discountVals[index].textContent = `${discount}%`;
potVals[index].textContent = `${savings + discount}%`;
});
}
priceRange.addEventListener('input', updateValues);
updateValues();