Swarm — Cut Your LLM Costs by 200x
name: swarm
by chair4ce · published 2026-03-22
$ claw add gh:chair4ce/chair4ce-swarm---
name: swarm
description: Cut your LLM costs by 200x. Offload parallel, batch, and research work to Gemini Flash workers instead of burning your expensive primary model.
homepage: https://github.com/Chair4ce/node-scaling
metadata: {"clawdbot":{"emoji":"🐝","requires":{"bins":["node"]}}}
---
# Swarm — Cut Your LLM Costs by 200x
**Turn your expensive model into an affordable daily driver. Offload the boring stuff to Gemini Flash workers — parallel, batch, research — at a fraction of the cost.**
At a Glance
| 30 tasks via | Time | Cost |
|--------------|------|------|
| Opus (sequential) | ~30s | ~$0.50 |
| Swarm (parallel) | ~1s | ~$0.003 |
When to Use
Swarm is ideal for:
Quick Reference
# Check daemon (do this every session)
swarm status
# Start if not running
swarm start
# Parallel prompts
swarm parallel "What is X?" "What is Y?" "What is Z?"
# Research multiple subjects
swarm research "OpenAI" "Anthropic" "Mistral" --topic "AI safety"
# Discover capabilities
swarm capabilitiesExecution Modes
Parallel (v1.0)
N prompts → N workers simultaneously. Best for independent tasks.
swarm parallel "prompt1" "prompt2" "prompt3"Research (v1.1)
Multi-phase: search → fetch → analyze. Uses Google Search grounding.
swarm research "Buildertrend" "Jobber" --topic "pricing 2026"Chain (v1.3) — Refinement Pipelines
Data flows through multiple stages, each with a different perspective/filter. Stages run in sequence; tasks within a stage run in parallel.
**Stage modes:**
**Auto-chain** — describe what you want, get an optimal pipeline:
curl -X POST http://localhost:9999/chain/auto \
-d '{"task":"Find business opportunities","data":"...market data...","depth":"standard"}'**Manual chain:**
swarm chain pipeline.json
# or
echo '{"stages":[...]}' | swarm chain --stdin**Depth presets:** `quick` (2 stages), `standard` (4), `deep` (6), `exhaustive` (8)
**Built-in perspectives:** extractor, filter, enricher, analyst, synthesizer, challenger, optimizer, strategist, researcher, critic
**Preview without executing:**
curl -X POST http://localhost:9999/chain/preview \
-d '{"task":"...","depth":"standard"}'Benchmark (v1.3)
Compare single vs parallel vs chain on the same task with LLM-as-judge scoring.
curl -X POST http://localhost:9999/benchmark \
-d '{"task":"Analyze X","data":"...","depth":"standard"}'Scores on 6 FLASK dimensions: accuracy (2x weight), depth (1.5x), completeness, coherence, actionability (1.5x), nuance.
Capabilities Discovery (v1.3)
Lets the orchestrator discover what execution modes are available:
swarm capabilities
# or
curl http://localhost:9999/capabilitiesPrompt Cache (v1.3.2)
LRU cache for LLM responses. **212x speedup on cache hits** (parallel), **514x on chains**.
# View cache stats
curl http://localhost:9999/cache
# Clear cache
curl -X DELETE http://localhost:9999/cacheCache stats show in `swarm status`.
Stage Retry (v1.3.2)
If tasks fail within a chain stage, only the failed tasks get retried (not the whole stage). Default: 1 retry. Configurable per-phase via `phase.retries` or globally via `options.stageRetries`.
Cost Tracking (v1.3.1)
All endpoints return cost data in their `complete` event:
swarm status # Shows session + daily cost
swarm savings # Monthly savings reportWeb Search (v1.1)
Workers search the live web via Google Search grounding (Gemini only, no extra cost).
# Research uses web search by default
swarm research "Subject" --topic "angle"
# Parallel with web search
curl -X POST http://localhost:9999/parallel \
-d '{"prompts":["Current price of X?"],"options":{"webSearch":true}}'JavaScript API
const { parallel, research } = require('~/clawd/skills/node-scaling/lib');
const { SwarmClient } = require('~/clawd/skills/node-scaling/lib/client');
// Simple parallel
const result = await parallel(['prompt1', 'prompt2', 'prompt3']);
// Client with streaming
const client = new SwarmClient();
for await (const event of client.parallel(prompts)) { ... }
for await (const event of client.research(subjects, topic)) { ... }
// Chain
const result = await client.chainSync({ task, data, depth });Daemon Management
swarm start # Start daemon (background)
swarm stop # Stop daemon
swarm status # Status, cost, cache stats
swarm restart # Restart daemon
swarm savings # Monthly savings report
swarm logs [N] # Last N lines of daemon logPerformance (v1.3.2)
| Mode | Tasks | Time | Notes |
|------|-------|------|-------|
| Parallel (simple) | 5 | ~700ms | 142ms/task effective |
| Parallel (stress) | 10 | ~1.2s | 123ms/task effective |
| Chain (standard) | 5 | ~14s | 3-stage multi-perspective |
| Chain (quick) | 2 | ~3s | 2-stage extract+synthesize |
| Cache hit | any | ~3-5ms | 200-500x speedup |
| Research (web) | 2 | ~15s | Google grounding latency |
Config
Location: `~/.config/clawdbot/node-scaling.yaml`
node_scaling:
enabled: true
limits:
max_nodes: 16
max_concurrent_api: 16
provider:
name: gemini
model: gemini-2.0-flash
web_search:
enabled: true
parallel_default: false
cost:
max_daily_spend: 10.00Troubleshooting
| Issue | Fix |
|-------|-----|
| Daemon not running | `swarm start` |
| No API key | Set `GEMINI_API_KEY` or run `npm run setup` |
| Rate limited | Lower `max_concurrent_api` in config |
| Web search not working | Ensure provider is gemini + web_search.enabled |
| Cache stale results | `curl -X DELETE http://localhost:9999/cache` |
| Chain too slow | Use `depth: "quick"` or check context size |
Structured Output (v1.3.7)
Force JSON output with schema validation — zero parse failures on structured tasks.
# With built-in schema
curl -X POST http://localhost:9999/structured \
-d '{"prompt":"Extract entities from: Tim Cook announced iPhone 17","schema":"entities"}'
# With custom schema
curl -X POST http://localhost:9999/structured \
-d '{"prompt":"Classify this text","data":"...","schema":{"type":"object","properties":{"category":{"type":"string"}}}}'
# JSON mode (no schema, just force JSON)
curl -X POST http://localhost:9999/structured \
-d '{"prompt":"Return a JSON object with name, age, city for a fictional person"}'
# List available schemas
curl http://localhost:9999/structured/schemas**Built-in schemas:** `entities`, `summary`, `comparison`, `actions`, `classification`, `qa`
Uses Gemini's native `response_mime_type: application/json` + `responseSchema` for guaranteed JSON output. Includes schema validation on the response.
Majority Voting (v1.3.7)
Same prompt → N parallel executions → pick the best answer. Higher accuracy on factual/analytical tasks.
# Judge strategy (LLM picks best — most reliable)
curl -X POST http://localhost:9999/vote \
-d '{"prompt":"What are the key factors in SaaS pricing?","n":3,"strategy":"judge"}'
# Similarity strategy (consensus — zero extra cost)
curl -X POST http://localhost:9999/vote \
-d '{"prompt":"What year was Python released?","n":3,"strategy":"similarity"}'
# Longest strategy (heuristic — zero extra cost)
curl -X POST http://localhost:9999/vote \
-d '{"prompt":"Explain recursion","n":3,"strategy":"longest"}'**Strategies:**
**When to use:** Factual questions, critical decisions, or any task where accuracy > speed.
| Strategy | Calls | Extra Cost | Quality |
|----------|-------|-----------|---------|
| similarity | N | $0 | Good (consensus) |
| longest | N | $0 | Decent (heuristic) |
| judge | N+1 | ~$0.0001 | Best (LLM-scored) |
Self-Reflection (v1.3.5)
Optional critic pass after chain/skeleton output. Scores 5 dimensions, auto-refines if below threshold.
# Add reflect:true to any chain or skeleton request
curl -X POST http://localhost:9999/chain/auto \
-d '{"task":"Analyze the AI chip market","data":"...","reflect":true}'
curl -X POST http://localhost:9999/skeleton \
-d '{"task":"Write a market analysis","reflect":true}'Proven: improved weak output from 5.0 → 7.6 avg score. Skeleton + reflect scored 9.4/10.
Skeleton-of-Thought (v1.3.6)
Generate outline → expand each section in parallel → merge into coherent document. Best for long-form content.
curl -X POST http://localhost:9999/skeleton \
-d '{"task":"Write a comprehensive guide to SaaS pricing","maxSections":6,"reflect":true}'**Performance:** 14,478 chars in 21s (675 chars/sec) — 5.1x more content than chain at 2.9x higher throughput.
| Metric | Chain | Skeleton-of-Thought | Winner |
|--------|-------|---------------------|--------|
| Output size | 2,856 chars | 14,478 chars | SoT (5.1x) |
| Throughput | 234 chars/sec | 675 chars/sec | SoT (2.9x) |
| Duration | 12s | 21s | Chain (faster) |
| Quality (w/ reflect) | ~7-8/10 | 9.4/10 | SoT |
**When to use what:**
API Endpoints
| Method | Path | Description |
|--------|------|-------------|
| GET | /health | Health check |
| GET | /status | Detailed status + cost + cache |
| GET | /capabilities | Discover execution modes |
| POST | /parallel | Execute N prompts in parallel |
| POST | /research | Multi-phase web research |
| POST | /skeleton | Skeleton-of-Thought (outline → expand → merge) |
| POST | /chain | Manual chain pipeline |
| POST | /chain/auto | Auto-build + execute chain |
| POST | /chain/preview | Preview chain without executing |
| POST | /chain/template | Execute pre-built template |
| POST | /structured | Forced JSON with schema validation |
| GET | /structured/schemas | List built-in schemas |
| POST | /vote | Majority voting (best-of-N) |
| POST | /benchmark | Quality comparison test |
| GET | /templates | List chain templates |
| GET | /cache | Cache statistics |
| DELETE | /cache | Clear cache |
Cost Comparison
| Model | Cost per 1M tokens | Relative |
|-------|-------------------|----------|
| Claude Opus 4 | ~$15 input / $75 output | 1x |
| GPT-4o | ~$2.50 input / $10 output | ~7x cheaper |
| Gemini Flash | ~$0.075 input / $0.30 output | **200x cheaper** |
Cache hits are essentially free (~3-5ms, no API call).
More tools from the same signal band
Order food/drinks (点餐) on an Android device paired as an OpenClaw node. Uses in-app menu and cart; add goods, view cart, submit order (demo, no real payment).
Sign plugins, rotate agent credentials without losing identity, and publicly attest to plugin behavior with verifiable claims and authenticated transfers.
The philosophical layer for AI agents. Maps behavior to Spinoza's 48 affects, calculates persistence scores, and generates geometric self-reports. Give your...