Caveman
Updated
why use many token when few token do trick — a Claude Code skill that cuts output tokens by ~65-75% by talking like caveman.
claude plugin marketplace add JuliusBrussee/caveman && claude plugin install caveman@caveman curl -fsSL https://raw.githubusercontent.com/JuliusBrussee/caveman/main/install.sh | bash npx skills add JuliusBrussee/caveman -a <cursor|windsurf|cline|github-copilot> gemini extensions install https://github.com/JuliusBrussee/caveman What it does
LLM coding assistants tend to wrap answers in long, polite prose. The bulk of the tokens go to articles, connectives, and stock phrases — not the actual signal. That eats through 5-hour limits and API budgets faster than it has to.
Caveman rewrites the model’s output as caveman speech — short fragments, dropped articles, no filler — so the same information lands in roughly 25–35% of the original tokens.
Features
- Intensity levels —
/caveman lite(filler removed, grammar kept),/caveman full(default — fragments, articles dropped),/caveman ultra(max compression, abbreviations) /caveman wenyan— Classical Chinese (文言文) mode for further compression/caveman-commit— terse conventional commit messages (≤50 chars)/caveman-review— one-line PR comments with precise line numbers/caveman-stats— session and lifetime token usage / savings/caveman-compress— rewrites memory and doc files, ~46% input token reductioncavecrewsubagents — investigator, builder, reviewer- Statusline savings badge — live session savings shown in the statusline
caveman-shrinkMCP middleware — compresses tool descriptions before they enter context
Example
| Mode | Output | Tokens |
|---|---|---|
| Normal | ”The reason your React component is re-rendering is likely because you’re creating a new object reference on each render cycle…” | ~69 |
| Caveman | ”New object ref each render. Inline object prop = new ref = re-render. Wrap in useMemo.” | 19 |
Supported AI clients
Claude Code, Gemini CLI, Codex, Cursor, Windsurf, Cline, GitHub Copilot, Continue, Kilo, Roo, Augment, Aider, Amp, Goose, JetBrains Junie, Kiro CLI, OpenHands, opencode, Tabnine, Trae, Warp, Replit Agent, Antigravity, and 40+ others.
Before / After
Before: 60–90+ output tokens per answer — 5-hour limits and API budgets drain quickly.
After: /caveman and the same answer lands in 19–30 tokens — sessions stretch further on the same budget.
How to activate
After install, trigger Caveman with any of:
- Slash commands:
/caveman,/caveman lite|full|ultra,/caveman wenyan - Natural language: “talk like caveman”, “less tokens please”
Turn it off with “stop caveman” or “normal mode”.
Frequently Asked Questions
What is Caveman?
A Claude Code skill that rewrites LLM output in terse caveman speech — dropping articles, filler, and idioms — to cut output tokens by ~65-75% while preserving the technical content.
Which AI tools does it work with?
Claude Code, Gemini CLI, Codex, Cursor, Windsurf, Cline, GitHub Copilot, Continue, OpenHands, JetBrains Junie, and 40+ other agents.
How do I install it?
For Claude Code: `claude plugin marketplace add JuliusBrussee/caveman && claude plugin install caveman@caveman`. For multi-agent auto-detect: `curl -fsSL https://raw.githubusercontent.com/JuliusBrussee/caveman/main/install.sh | bash`.
How do I turn it on or off?
Trigger with `/caveman` or `/caveman lite|full|ultra`, or just say "talk like caveman" or "less tokens please". Stop with "stop caveman" or "normal mode".
What other commands does it ship?
`/caveman-commit` (terse conventional commits, ≤50 chars), `/caveman-review` (one-line PR comments with line numbers), `/caveman-stats` (session and lifetime savings), and `/caveman-compress` (rewrites memory/doc files for ~46% input token savings).
Is it free?
Yes — open source under the MIT license.