← Back to Kontasks
Prompt Caching Strategy for Maximum Cache Hits
Design for consistent context preambles to maximize Anthropic prompt caching
Output
Prompt Caching Strategy
How Anthropic Prompt Caching Works
Claude caches the prefix of your prompt. If subsequent requests share the same prefix, you pay 90% less for those cached tokens and get faster responses.
| Cache write | 1.25x base price (one-time) |
| Cache read | 0.1x base price (90% savings!) |
| TTL | 5 minutes (resets on each use) |
Key Principle: Stable Prefix
┌──────────────────────────────────────────────────────────┐ │ CACHED (stable across requests) │ │ ┌────────────────────────────────────────────────────┐ │ │ │ System prompt (CLAUDE.md core) ~8k │ │ │ │ Project structure ~2k │ │ │ │ Tool definitions ~4k │ │ │ │ Output instructions (kontask format) ~2k │ │ │ └────────────────────────────────────────────────────┘ │ ├──────────────────────────────────────────────────────────┤ │ DYNAMIC (changes per request) │ │ ┌────────────────────────────────────────────────────┐ │ │ │ Session history / conversation var │ │ │ │ Current request context var │ │ │ │ User prompt var │ │ │ └────────────────────────────────────────────────────┘ │ └──────────────────────────────────────────────────────────┘
Implementation Strategy
| 1. Freeze CLAUDE.md | Don't modify frequently. Changes invalidate cache for all users. |
| 2. Version context | Use a hash/version in context. Same version = cache hit. |
| 3. Order matters | Put stable content FIRST, dynamic content LAST. |
| 4. Batch similar | Group requests by scope (vibetools vs product) for shared context. |
| 5. Keep sessions warm | Activity within 5 min extends cache TTL. |
Proposed Context Structure
// Layer 1: Universal (cached across ALL requests) [CACHE_CONTROL: ephemeral] - Base system prompt (persona, safety) - Tool definitions - Output format requirements // Layer 2: Scope-specific (cached within scope) [CACHE_CONTROL: ephemeral] - IF vibetools: konui/konsole docs - IF product: listings/CMS docs - Relevant CLAUDE.md sections // Layer 3: Session (not cached, changes each turn) [NO CACHE] - Conversation history - Current working context - User's prompt
Quick Win: Quick Turn Caching
Quick Turn is stateless - perfect for caching:
// Every Quick Turn request uses SAME system prompt const QT_SYSTEM = `You are a fast Q&A assistant. Answer briefly and directly. No tools, no file access, just knowledge.`; // This ~50 tokens gets cached, 90% savings on every QT
Expected Savings
| Scenario | Before | After | Savings |
|---|---|---|---|
| Quick Turn (stateless) | 100% | 10% | 90% |
| Full turn (same scope) | 100% | 40% | 60% |
| Full turn (scope switch) | 100% | 70% | 30% |
Implementation Path
- Quick Turn: Add fixed system prompt with cache_control header
- Konsole: Layer context with stable prefix first
- Monitor: Track cache_read_input_tokens in StatusLine data
- Optimize: A/B test context orderings for best cache hit rate
Quick Actions
Original Request
No layout configured
Details
Type
General
Status
Completed
Scope
vibetools
Tags
performancecachingarchitecture
Created
5 Jan 2026, 2:16 pm
Updated
5 Jan 2026, 2:16 pm
Created By
claude
Raw Data
{
"id": "927ccf49-29bb-43ee-a2ad-aba03c7a9af3",
"type": "general",
"status": "completed",
"title": "Prompt Caching Strategy for Maximum Cache Hits",
"description": "Design for consistent context preambles to maximize Anthropic prompt caching",
"context": {
"output": "<div style=\"background:#1e1e1e;color:#e0e0e0;padding:20px;border-radius:8px;font-family:sans-serif;\"><h2 style=\"color:#fff;margin:0 0 20px 0;\">Prompt Caching Strategy</h2><div style=\"background:#2d2d2d;padding:16px;border-radius:6px;margin-bottom:16px;\"><h3 style=\"margin:0 0 12px 0;color:#61afef;\">How Anthropic Prompt Caching Works</h3><p style=\"margin:0 0 12px 0;color:#9ca3af;line-height:1.6;\">Claude caches the <strong>prefix</strong> of your prompt. If subsequent requests share the same prefix, you pay <strong>90% less</strong> for those cached tokens and get <strong>faster</strong> responses.</p><table style=\"width:100%;border-collapse:collapse;font-size:13px;\"><tr style=\"border-bottom:1px solid #444;\"><td style=\"padding:8px 0;color:#e5c07b;\">Cache write</td><td style=\"color:#9ca3af;\">1.25x base price (one-time)</td></tr><tr style=\"border-bottom:1px solid #444;\"><td style=\"padding:8px 0;color:#98c379;\">Cache read</td><td style=\"color:#9ca3af;\">0.1x base price (90% savings!)</td></tr><tr><td style=\"padding:8px 0;color:#7f848e;\">TTL</td><td style=\"color:#9ca3af;\">5 minutes (resets on each use)</td></tr></table></div><div style=\"background:#2d2d2d;padding:16px;border-radius:6px;margin-bottom:16px;\"><h3 style=\"margin:0 0 12px 0;color:#98c379;\">Key Principle: Stable Prefix</h3><pre style=\"margin:0;font-family:monospace;font-size:11px;line-height:1.5;color:#abb2bf;background:#1a1a1a;padding:12px;border-radius:4px;overflow-x:auto;\">┌──────────────────────────────────────────────────────────┐\n│ <span style=\"color:#98c379;\">CACHED (stable across requests)</span> │\n│ ┌────────────────────────────────────────────────────┐ │\n│ │ System prompt (CLAUDE.md core) ~8k │ │\n│ │ Project structure ~2k │ │\n│ │ Tool definitions ~4k │ │\n│ │ Output instructions (kontask format) ~2k │ │\n│ └────────────────────────────────────────────────────┘ │\n├──────────────────────────────────────────────────────────┤\n│ <span style=\"color:#f97316;\">DYNAMIC (changes per request)</span> │\n│ ┌────────────────────────────────────────────────────┐ │\n│ │ Session history / conversation var │ │\n│ │ Current request context var │ │\n│ │ User prompt var │ │\n│ └────────────────────────────────────────────────────┘ │\n└──────────────────────────────────────────────────────────┘</pre></div><div style=\"background:#2d2d2d;padding:16px;border-radius:6px;margin-bottom:16px;\"><h3 style=\"margin:0 0 12px 0;color:#ce93d8;\">Implementation Strategy</h3><table style=\"width:100%;border-collapse:collapse;font-size:13px;\"><tr style=\"border-bottom:1px solid #444;\"><td style=\"padding:10px 0;color:#e5c07b;width:30%;\"><strong>1. Freeze CLAUDE.md</strong></td><td style=\"color:#9ca3af;\">Don't modify frequently. Changes invalidate cache for all users.</td></tr><tr style=\"border-bottom:1px solid #444;\"><td style=\"padding:10px 0;color:#e5c07b;\"><strong>2. Version context</strong></td><td style=\"color:#9ca3af;\">Use a hash/version in context. Same version = cache hit.</td></tr><tr style=\"border-bottom:1px solid #444;\"><td style=\"padding:10px 0;color:#e5c07b;\"><strong>3. Order matters</strong></td><td style=\"color:#9ca3af;\">Put stable content FIRST, dynamic content LAST.</td></tr><tr style=\"border-bottom:1px solid #444;\"><td style=\"padding:10px 0;color:#e5c07b;\"><strong>4. Batch similar</strong></td><td style=\"color:#9ca3af;\">Group requests by scope (vibetools vs product) for shared context.</td></tr><tr><td style=\"padding:10px 0;color:#e5c07b;\"><strong>5. Keep sessions warm</strong></td><td style=\"color:#9ca3af;\">Activity within 5 min extends cache TTL.</td></tr></table></div><div style=\"background:#2d2d2d;padding:16px;border-radius:6px;margin-bottom:16px;\"><h3 style=\"margin:0 0 12px 0;color:#61afef;\">Proposed Context Structure</h3><pre style=\"margin:0;font-family:monospace;font-size:11px;line-height:1.5;color:#abb2bf;background:#1a1a1a;padding:12px;border-radius:4px;\"><span style=\"color:#7f848e;\">// Layer 1: Universal (cached across ALL requests)</span>\n<span style=\"color:#98c379;\">[CACHE_CONTROL: ephemeral]</span>\n- Base system prompt (persona, safety)\n- Tool definitions\n- Output format requirements\n\n<span style=\"color:#7f848e;\">// Layer 2: Scope-specific (cached within scope)</span>\n<span style=\"color:#98c379;\">[CACHE_CONTROL: ephemeral]</span>\n- IF vibetools: konui/konsole docs\n- IF product: listings/CMS docs\n- Relevant CLAUDE.md sections\n\n<span style=\"color:#7f848e;\">// Layer 3: Session (not cached, changes each turn)</span>\n<span style=\"color:#f97316;\">[NO CACHE]</span>\n- Conversation history\n- Current working context\n- User's prompt</pre></div><div style=\"background:#2d2d2d;padding:16px;border-radius:6px;margin-bottom:16px;\"><h3 style=\"margin:0 0 12px 0;color:#f97316;\">Quick Win: Quick Turn Caching</h3><p style=\"margin:0 0 12px 0;color:#9ca3af;\">Quick Turn is stateless - perfect for caching:</p><pre style=\"margin:0;font-family:monospace;font-size:11px;line-height:1.5;color:#abb2bf;background:#1a1a1a;padding:12px;border-radius:4px;\"><span style=\"color:#7f848e;\">// Every Quick Turn request uses SAME system prompt</span>\nconst QT_SYSTEM = `You are a fast Q&A assistant.\nAnswer briefly and directly.\nNo tools, no file access, just knowledge.`;\n\n<span style=\"color:#7f848e;\">// This ~50 tokens gets cached, 90% savings on every QT</span></pre></div><div style=\"background:#2d2d2d;padding:16px;border-radius:6px;margin-bottom:16px;\"><h3 style=\"margin:0 0 12px 0;color:#98c379;\">Expected Savings</h3><table style=\"width:100%;border-collapse:collapse;font-size:13px;\"><tr style=\"border-bottom:1px solid #444;\"><th style=\"text-align:left;padding:8px 0;color:#7f848e;\">Scenario</th><th style=\"text-align:right;padding:8px 0;color:#7f848e;\">Before</th><th style=\"text-align:right;padding:8px 0;color:#7f848e;\">After</th><th style=\"text-align:right;padding:8px 0;color:#7f848e;\">Savings</th></tr><tr style=\"border-bottom:1px solid #444;\"><td style=\"padding:8px 0;\">Quick Turn (stateless)</td><td style=\"text-align:right;color:#9ca3af;\">100%</td><td style=\"text-align:right;color:#98c379;\">10%</td><td style=\"text-align:right;color:#98c379;\">90%</td></tr><tr style=\"border-bottom:1px solid #444;\"><td style=\"padding:8px 0;\">Full turn (same scope)</td><td style=\"text-align:right;color:#9ca3af;\">100%</td><td style=\"text-align:right;color:#98c379;\">40%</td><td style=\"text-align:right;color:#98c379;\">60%</td></tr><tr><td style=\"padding:8px 0;\">Full turn (scope switch)</td><td style=\"text-align:right;color:#9ca3af;\">100%</td><td style=\"text-align:right;color:#e5c07b;\">70%</td><td style=\"text-align:right;color:#e5c07b;\">30%</td></tr></table></div><div style=\"background:#1a2332;border:1px solid #61afef;padding:16px;border-radius:6px;\"><h4 style=\"margin:0 0 8px 0;color:#61afef;\">Implementation Path</h4><ol style=\"margin:0;padding-left:20px;line-height:1.8;color:#9ca3af;\"><li><strong>Quick Turn:</strong> Add fixed system prompt with cache_control header</li><li><strong>Konsole:</strong> Layer context with stable prefix first</li><li><strong>Monitor:</strong> Track cache_read_input_tokens in StatusLine data</li><li><strong>Optimize:</strong> A/B test context orderings for best cache hit rate</li></ol></div></div>",
"requestedAt": "2026-01-05T04:02:00Z",
"requestId": "16086a6a-5b7b-4094-b286-bcbe7fd5eee4",
"choices": [
{
"label": "Implement QT caching",
"value": "Add prompt caching to Quick Turn - fixed system prompt with cache_control header"
},
{
"label": "Add to backlog",
"value": "Add prompt caching strategy to the VIBE.md backlog"
},
{
"label": "Monitor current usage",
"value": "Check current cache hit rates from StatusLine data"
}
]
},
"createdBy": "claude",
"createdAt": "2026-01-05T04:16:13.935Z",
"updatedAt": "2026-01-05T04:16:14.141Z",
"requestId": "16086a6a-5b7b-4094-b286-bcbe7fd5eee4",
"scope": "vibetools",
"tags": [
"performance",
"caching",
"architecture"
],
"targetUser": "claude"
}