ExpertPack Eval
name: expertpack-eval
by brianhearn · published 2026-03-22
$ claw add gh:brianhearn/brianhearn-expertpack-eval---
name: expertpack-eval
description: "Measure ExpertPack EK (Esoteric Knowledge) ratio and run automated quality evals. Use when: (1) Measuring what percentage of a pack's content frontier LLMs cannot produce on their own, (2) Running automated eval sets against a pack-powered agent with LLM-as-judge scoring. Requires OpenRouter API key (auto-resolved from OpenClaw auth or OPENROUTER_API_KEY env var). Companion to the main expertpack skill. Triggers on: 'EK ratio', 'measure EK', 'blind probe', 'eval expertpack', 'pack quality eval', 'run eval', 'esoteric knowledge ratio'."
metadata:
openclaw:
homepage: https://expertpack.ai
requires:
bins:
- python3
---
# ExpertPack Eval
Measure and evaluate ExpertPack quality. Companion to the core [expertpack](https://clawhub.ai/skills/expertpack) skill.
**Note:** This skill makes external API calls to OpenRouter for blind probing and LLM-as-judge scoring. Requires an API key.
1. Measure EK Ratio
Blind-probe frontier models to measure what percentage of a pack's propositions they cannot answer without the pack loaded:
python3 {skill_dir}/scripts/eval-ek.py <pack-path> [--models model1,model2] [--sample N] [--output FILE]**Interpretation:**
| EK Ratio | Meaning |
|----------|---------|
| 0.80+ | Exceptional — almost entirely esoteric |
| 0.60–0.79 | Strong — majority esoteric |
| 0.40–0.59 | Mixed — significant GK padding |
| 0.20–0.39 | Weak — most content already in weights |
| < 0.20 | Minimal value-add |
Add measured ratio to `manifest.yaml`:
ek_ratio:
value: 0.72
measured: "2026-03-12"
models: ["gpt-4.1-mini", "claude-sonnet-4-6", "gemini-2.0-flash"]
propositions_tested: 1422. Run Quality Eval
Automated eval against a pack-powered agent endpoint:
python3 {skill_dir}/scripts/run-eval.py \
--questions <eval-set.yaml> \
--endpoint <ws://host:port/path> \
--output <results.yaml> \
--label "baseline"**Learn more:** [expertpack.ai](https://expertpack.ai) · [GitHub](https://github.com/brianhearn/ExpertPack)
More tools from the same signal band
Order food/drinks (点餐) on an Android device paired as an OpenClaw node. Uses in-app menu and cart; add goods, view cart, submit order (demo, no real payment).
Sign plugins, rotate agent credentials without losing identity, and publicly attest to plugin behavior with verifiable claims and authenticated transfers.
The philosophical layer for AI agents. Maps behavior to Spinoza's 48 affects, calculates persistence scores, and generates geometric self-reports. Give your...