API Quality Check
name: api-quality-check
by chekhovin · published 2026-04-01
$ claw add gh:chekhovin/chekhovin-api-quality-check---
name: api-quality-check
description: Check coding-model API quality, capability fit, and drift with LT-lite and B3IT-lite. Use when Codex needs to verify whether an OpenAI/OpenAI-compatible/Anthropic endpoint can support first-token detection, logprob tracking, baseline-vs-current drift checks, or headless API quality smoke tests for coding CLIs, terminal agents, and OpenClaw-style workflows.
---
# API Quality Check
Use the bundled script to run headless API-quality checks. Treat this skill as script-first: do not recreate LT-lite/B3IT-lite logic inline unless the script is clearly insufficient.
Provider names such as Ark/Volcengine, GLM, DeepSeek, Kimi, SiliconFlow, and similar services are examples only. The primary decision is the endpoint protocol type: `OpenAI`, `OpenAI-Compatible`, or `Anthropic`.
Quick start
Set the path once:
export CODEX_HOME="${CODEX_HOME:-$HOME/.codex}"
export APIQ="$CODEX_HOME/skills/api-quality-check/scripts/api_quality_check.py"
export APIQ_BATCH="$CODEX_HOME/skills/api-quality-check/scripts/run_batch_checks.sh"
export APIQ_DAILY="$CODEX_HOME/skills/api-quality-check/scripts/run_daily_check.sh"Run a capability smoke test first:
python "$APIQ" smoke \
--provider "OpenAI-Compatible" \
--base-url "https://ark.cn-beijing.volces.com/api/coding/v3" \
--api-key "$API_KEY" \
--model-id "ark-code-latest" \
--html-output ./smoke.htmlFor many OpenAI-compatible endpoints, the same command also works if the user pastes the full `.../chat/completions` URL. The script will normalize it back to the API root automatically.
If you want a ready-to-run `provider.json` first, generate it with:
python "$APIQ" init-config \
--provider "OpenAI-Compatible" \
--base-url "https://api.siliconflow.cn/v1/chat/completions" \
--api-key "$API_KEY" \
--model-id "deepseek-ai/DeepSeek-V3.2" \
--name "siliconflow-v3-2" \
--config-output ./provider.jsonIf an endpoint requires client-specific headers, put them in the config JSON as a `headers` object or pass them with `--headers-json`. For Kimi coding endpoints, use `{"User-Agent":"KimiCLI/2.0.0"}` only when the address is under `https://api.kimi.com/coding`; for the OpenAI-compatible Kimi path, use `https://api.kimi.com/coding/v1`.
If you already have multiple raw endpoint entries, normalize them into `providers.json` with:
python "$APIQ" init-batch-config \
--configs ./raw-providers.json \
--config-output ./providers.jsonOr run the full batch pipeline:
"$APIQ_BATCH" ./providers.json ./api-quality-outThat command also creates `./api-quality-out/index.html` as the landing page for all generated reports.
For one endpoint that you want to check every day and archive by date:
bash "$APIQ_DAILY" ./provider.json ./daily-out my-endpointWorkflow
1. Run `smoke` before any baseline or detect run.
2. If you have many endpoints, run `batch-smoke` with a config list before choosing which ones deserve deeper LT/B3IT work.
3. Read the result:
- `b3it_supported=true`: the endpoint can return normal first-token text at `max_tokens=1`
- `lt_supported=true`: the endpoint also returns `logprobs`, so LT-lite can run
- `recommended_detector`: the script's direct recommendation for the next step
4. If `lt_supported=false`, do not force LT-lite; pivot to B3IT-lite or report that LT is unavailable.
5. Save baselines to explicit JSON files and reuse them for later detection.
6. Keep outputs file-based for coding CLIs and OpenClaw. Do not depend on GUI state.
7. For noisy endpoints, prefer the built-in B3IT defaults before tightening or loosening thresholds manually.
Endpoint Types
Commands
Capability smoke
python "$APIQ" smoke --config ./provider.json --output ./smoke.jsonGenerate a provider config template
python "$APIQ" init-config \
--provider "OpenAI-Compatible" \
--base-url "https://api.siliconflow.cn/v1/chat/completions" \
--api-key "$API_KEY" \
--model-id "deepseek-ai/DeepSeek-V3.2" \
--config-output ./provider.jsonGenerate a batch providers.json template
python "$APIQ" init-batch-config \
--configs ./raw-providers.json \
--config-output ./providers.jsonBatch capability smoke
python "$APIQ" batch-smoke --configs ./providers.json --output ./batch-smoke.json --html-output ./batch-smoke.htmlBatch LT-lite baselines
python "$APIQ" batch-lt-baseline \
--configs ./providers.json \
--output-dir ./lt-baselines \
--output ./batch-lt-baselines.json \
--html-output ./batch-lt-baselines.htmlBatch LT-lite detect
python "$APIQ" batch-lt-detect \
--configs ./providers.json \
--baseline-manifest ./batch-lt-baselines.json \
--output ./batch-lt-report.json \
--html-output ./batch-lt-report.htmlBatch B3IT-lite baselines
python "$APIQ" batch-b3it-baseline \
--configs ./providers.json \
--output-dir ./b3it-baselines \
--output ./batch-b3it-baselines.json \
--html-output ./batch-b3it-baselines.htmlBatch B3IT-lite detect
python "$APIQ" batch-b3it-detect \
--configs ./providers.json \
--baseline-manifest ./batch-b3it-baselines.json \
--output ./batch-b3it-report.json \
--html-output ./batch-b3it-report.html \
--detection-repeats 5 \
--min-stable-count 2 \
--min-stable-ratio 0.35 \
--confirm-passes 1LT-lite baseline
python "$APIQ" lt-baseline --config ./provider.json --output ./lt-baseline.jsonLT-lite detect
python "$APIQ" lt-detect \
--config ./provider.json \
--baseline ./lt-baseline.json \
--output ./lt-report.jsonB3IT-lite baseline
python "$APIQ" b3it-baseline --config ./provider.json --output ./b3it-baseline.jsonB3IT-lite detect
python "$APIQ" b3it-detect \
--config ./provider.json \
--baseline ./b3it-baseline.json \
--output ./b3it-report.json \
--detection-repeats 5 \
--min-stable-count 2 \
--min-stable-ratio 0.35 \
--confirm-passes 1Daily single-endpoint drift run
bash "$APIQ_DAILY" ./provider.json ./daily-out my-endpointDefaults and guardrails
Resources
Open only what you need:
More tools from the same signal band
Order food/drinks (点餐) on an Android device paired as an OpenClaw node. Uses in-app menu and cart; add goods, view cart, submit order (demo, no real payment).
Sign plugins, rotate agent credentials without losing identity, and publicly attest to plugin behavior with verifiable claims and authenticated transfers.
The philosophical layer for AI agents. Maps behavior to Spinoza's 48 affects, calculates persistence scores, and generates geometric self-reports. Give your...