Completed
on 5 Jan 2026, 2:16 pm

Output

Prompt Caching Strategy

How Anthropic Prompt Caching Works

Claude caches the prefix of your prompt. If subsequent requests share the same prefix, you pay 90% less for those cached tokens and get faster responses.

Cache write1.25x base price (one-time)
Cache read0.1x base price (90% savings!)
TTL5 minutes (resets on each use)

Key Principle: Stable Prefix

┌──────────────────────────────────────────────────────────┐
│  CACHED (stable across requests)                       │
│  ┌────────────────────────────────────────────────────┐  │
│  │ System prompt (CLAUDE.md core)              ~8k   │  │
│  │ Project structure                          ~2k   │  │
│  │ Tool definitions                           ~4k   │  │
│  │ Output instructions (kontask format)       ~2k   │  │
│  └────────────────────────────────────────────────────┘  │
├──────────────────────────────────────────────────────────┤
│  DYNAMIC (changes per request)                        │
│  ┌────────────────────────────────────────────────────┐  │
│  │ Session history / conversation             var   │  │
│  │ Current request context                    var   │  │
│  │ User prompt                                var   │  │
│  └────────────────────────────────────────────────────┘  │
└──────────────────────────────────────────────────────────┘

Implementation Strategy

1. Freeze CLAUDE.mdDon't modify frequently. Changes invalidate cache for all users.
2. Version contextUse a hash/version in context. Same version = cache hit.
3. Order mattersPut stable content FIRST, dynamic content LAST.
4. Batch similarGroup requests by scope (vibetools vs product) for shared context.
5. Keep sessions warmActivity within 5 min extends cache TTL.

Proposed Context Structure

// Layer 1: Universal (cached across ALL requests)
[CACHE_CONTROL: ephemeral]
- Base system prompt (persona, safety)
- Tool definitions
- Output format requirements

// Layer 2: Scope-specific (cached within scope)
[CACHE_CONTROL: ephemeral]
- IF vibetools: konui/konsole docs
- IF product: listings/CMS docs
- Relevant CLAUDE.md sections

// Layer 3: Session (not cached, changes each turn)
[NO CACHE]
- Conversation history
- Current working context
- User's prompt

Quick Win: Quick Turn Caching

Quick Turn is stateless - perfect for caching:

// Every Quick Turn request uses SAME system prompt
const QT_SYSTEM = `You are a fast Q&A assistant.
Answer briefly and directly.
No tools, no file access, just knowledge.`;

// This ~50 tokens gets cached, 90% savings on every QT

Expected Savings

ScenarioBeforeAfterSavings
Quick Turn (stateless)100%10%90%
Full turn (same scope)100%40%60%
Full turn (scope switch)100%70%30%

Implementation Path

  1. Quick Turn: Add fixed system prompt with cache_control header
  2. Konsole: Layer context with stable prefix first
  3. Monitor: Track cache_read_input_tokens in StatusLine data
  4. Optimize: A/B test context orderings for best cache hit rate

Quick Actions

Original Request

No layout configured

Details

Type General
Status Completed
Scope vibetools
Tags performancecachingarchitecture
Created 5 Jan 2026, 2:16 pm
Updated 5 Jan 2026, 2:16 pm
Created By claude

Raw Data

{
  "id": "927ccf49-29bb-43ee-a2ad-aba03c7a9af3",
  "type": "general",
  "status": "completed",
  "title": "Prompt Caching Strategy for Maximum Cache Hits",
  "description": "Design for consistent context preambles to maximize Anthropic prompt caching",
  "context": {
    "output": "<div style=\"background:#1e1e1e;color:#e0e0e0;padding:20px;border-radius:8px;font-family:sans-serif;\"><h2 style=\"color:#fff;margin:0 0 20px 0;\">Prompt Caching Strategy</h2><div style=\"background:#2d2d2d;padding:16px;border-radius:6px;margin-bottom:16px;\"><h3 style=\"margin:0 0 12px 0;color:#61afef;\">How Anthropic Prompt Caching Works</h3><p style=\"margin:0 0 12px 0;color:#9ca3af;line-height:1.6;\">Claude caches the <strong>prefix</strong> of your prompt. If subsequent requests share the same prefix, you pay <strong>90% less</strong> for those cached tokens and get <strong>faster</strong> responses.</p><table style=\"width:100%;border-collapse:collapse;font-size:13px;\"><tr style=\"border-bottom:1px solid #444;\"><td style=\"padding:8px 0;color:#e5c07b;\">Cache write</td><td style=\"color:#9ca3af;\">1.25x base price (one-time)</td></tr><tr style=\"border-bottom:1px solid #444;\"><td style=\"padding:8px 0;color:#98c379;\">Cache read</td><td style=\"color:#9ca3af;\">0.1x base price (90% savings!)</td></tr><tr><td style=\"padding:8px 0;color:#7f848e;\">TTL</td><td style=\"color:#9ca3af;\">5 minutes (resets on each use)</td></tr></table></div><div style=\"background:#2d2d2d;padding:16px;border-radius:6px;margin-bottom:16px;\"><h3 style=\"margin:0 0 12px 0;color:#98c379;\">Key Principle: Stable Prefix</h3><pre style=\"margin:0;font-family:monospace;font-size:11px;line-height:1.5;color:#abb2bf;background:#1a1a1a;padding:12px;border-radius:4px;overflow-x:auto;\">┌──────────────────────────────────────────────────────────┐\n│  <span style=\"color:#98c379;\">CACHED (stable across requests)</span>                       │\n│  ┌────────────────────────────────────────────────────┐  │\n│  │ System prompt (CLAUDE.md core)              ~8k   │  │\n│  │ Project structure                          ~2k   │  │\n│  │ Tool definitions                           ~4k   │  │\n│  │ Output instructions (kontask format)       ~2k   │  │\n│  └────────────────────────────────────────────────────┘  │\n├──────────────────────────────────────────────────────────┤\n│  <span style=\"color:#f97316;\">DYNAMIC (changes per request)</span>                        │\n│  ┌────────────────────────────────────────────────────┐  │\n│  │ Session history / conversation             var   │  │\n│  │ Current request context                    var   │  │\n│  │ User prompt                                var   │  │\n│  └────────────────────────────────────────────────────┘  │\n└──────────────────────────────────────────────────────────┘</pre></div><div style=\"background:#2d2d2d;padding:16px;border-radius:6px;margin-bottom:16px;\"><h3 style=\"margin:0 0 12px 0;color:#ce93d8;\">Implementation Strategy</h3><table style=\"width:100%;border-collapse:collapse;font-size:13px;\"><tr style=\"border-bottom:1px solid #444;\"><td style=\"padding:10px 0;color:#e5c07b;width:30%;\"><strong>1. Freeze CLAUDE.md</strong></td><td style=\"color:#9ca3af;\">Don't modify frequently. Changes invalidate cache for all users.</td></tr><tr style=\"border-bottom:1px solid #444;\"><td style=\"padding:10px 0;color:#e5c07b;\"><strong>2. Version context</strong></td><td style=\"color:#9ca3af;\">Use a hash/version in context. Same version = cache hit.</td></tr><tr style=\"border-bottom:1px solid #444;\"><td style=\"padding:10px 0;color:#e5c07b;\"><strong>3. Order matters</strong></td><td style=\"color:#9ca3af;\">Put stable content FIRST, dynamic content LAST.</td></tr><tr style=\"border-bottom:1px solid #444;\"><td style=\"padding:10px 0;color:#e5c07b;\"><strong>4. Batch similar</strong></td><td style=\"color:#9ca3af;\">Group requests by scope (vibetools vs product) for shared context.</td></tr><tr><td style=\"padding:10px 0;color:#e5c07b;\"><strong>5. Keep sessions warm</strong></td><td style=\"color:#9ca3af;\">Activity within 5 min extends cache TTL.</td></tr></table></div><div style=\"background:#2d2d2d;padding:16px;border-radius:6px;margin-bottom:16px;\"><h3 style=\"margin:0 0 12px 0;color:#61afef;\">Proposed Context Structure</h3><pre style=\"margin:0;font-family:monospace;font-size:11px;line-height:1.5;color:#abb2bf;background:#1a1a1a;padding:12px;border-radius:4px;\"><span style=\"color:#7f848e;\">// Layer 1: Universal (cached across ALL requests)</span>\n<span style=\"color:#98c379;\">[CACHE_CONTROL: ephemeral]</span>\n- Base system prompt (persona, safety)\n- Tool definitions\n- Output format requirements\n\n<span style=\"color:#7f848e;\">// Layer 2: Scope-specific (cached within scope)</span>\n<span style=\"color:#98c379;\">[CACHE_CONTROL: ephemeral]</span>\n- IF vibetools: konui/konsole docs\n- IF product: listings/CMS docs\n- Relevant CLAUDE.md sections\n\n<span style=\"color:#7f848e;\">// Layer 3: Session (not cached, changes each turn)</span>\n<span style=\"color:#f97316;\">[NO CACHE]</span>\n- Conversation history\n- Current working context\n- User's prompt</pre></div><div style=\"background:#2d2d2d;padding:16px;border-radius:6px;margin-bottom:16px;\"><h3 style=\"margin:0 0 12px 0;color:#f97316;\">Quick Win: Quick Turn Caching</h3><p style=\"margin:0 0 12px 0;color:#9ca3af;\">Quick Turn is stateless - perfect for caching:</p><pre style=\"margin:0;font-family:monospace;font-size:11px;line-height:1.5;color:#abb2bf;background:#1a1a1a;padding:12px;border-radius:4px;\"><span style=\"color:#7f848e;\">// Every Quick Turn request uses SAME system prompt</span>\nconst QT_SYSTEM = `You are a fast Q&A assistant.\nAnswer briefly and directly.\nNo tools, no file access, just knowledge.`;\n\n<span style=\"color:#7f848e;\">// This ~50 tokens gets cached, 90% savings on every QT</span></pre></div><div style=\"background:#2d2d2d;padding:16px;border-radius:6px;margin-bottom:16px;\"><h3 style=\"margin:0 0 12px 0;color:#98c379;\">Expected Savings</h3><table style=\"width:100%;border-collapse:collapse;font-size:13px;\"><tr style=\"border-bottom:1px solid #444;\"><th style=\"text-align:left;padding:8px 0;color:#7f848e;\">Scenario</th><th style=\"text-align:right;padding:8px 0;color:#7f848e;\">Before</th><th style=\"text-align:right;padding:8px 0;color:#7f848e;\">After</th><th style=\"text-align:right;padding:8px 0;color:#7f848e;\">Savings</th></tr><tr style=\"border-bottom:1px solid #444;\"><td style=\"padding:8px 0;\">Quick Turn (stateless)</td><td style=\"text-align:right;color:#9ca3af;\">100%</td><td style=\"text-align:right;color:#98c379;\">10%</td><td style=\"text-align:right;color:#98c379;\">90%</td></tr><tr style=\"border-bottom:1px solid #444;\"><td style=\"padding:8px 0;\">Full turn (same scope)</td><td style=\"text-align:right;color:#9ca3af;\">100%</td><td style=\"text-align:right;color:#98c379;\">40%</td><td style=\"text-align:right;color:#98c379;\">60%</td></tr><tr><td style=\"padding:8px 0;\">Full turn (scope switch)</td><td style=\"text-align:right;color:#9ca3af;\">100%</td><td style=\"text-align:right;color:#e5c07b;\">70%</td><td style=\"text-align:right;color:#e5c07b;\">30%</td></tr></table></div><div style=\"background:#1a2332;border:1px solid #61afef;padding:16px;border-radius:6px;\"><h4 style=\"margin:0 0 8px 0;color:#61afef;\">Implementation Path</h4><ol style=\"margin:0;padding-left:20px;line-height:1.8;color:#9ca3af;\"><li><strong>Quick Turn:</strong> Add fixed system prompt with cache_control header</li><li><strong>Konsole:</strong> Layer context with stable prefix first</li><li><strong>Monitor:</strong> Track cache_read_input_tokens in StatusLine data</li><li><strong>Optimize:</strong> A/B test context orderings for best cache hit rate</li></ol></div></div>",
    "requestedAt": "2026-01-05T04:02:00Z",
    "requestId": "16086a6a-5b7b-4094-b286-bcbe7fd5eee4",
    "choices": [
      {
        "label": "Implement QT caching",
        "value": "Add prompt caching to Quick Turn - fixed system prompt with cache_control header"
      },
      {
        "label": "Add to backlog",
        "value": "Add prompt caching strategy to the VIBE.md backlog"
      },
      {
        "label": "Monitor current usage",
        "value": "Check current cache hit rates from StatusLine data"
      }
    ]
  },
  "createdBy": "claude",
  "createdAt": "2026-01-05T04:16:13.935Z",
  "updatedAt": "2026-01-05T04:16:14.141Z",
  "requestId": "16086a6a-5b7b-4094-b286-bcbe7fd5eee4",
  "scope": "vibetools",
  "tags": [
    "performance",
    "caching",
    "architecture"
  ],
  "targetUser": "claude"
}
DashboardReportsKontasksSessionsTelemetryLogs + Go