⚡

// Skill profile

A/B Test Setup Skill

Name: A/B Test Setup Skill
Author: amdf01-debug

Plan A/B tests with proper methodology — hypothesis, sample size, duration, variant design, statistical significance.

by amdf01-debug · published 2026-04-01

自定义

Total installs

Stars

★ 0

Last updated

2026-04

// Install command

$ claw add gh:amdf01-debug/amdf01-debug-sw-ab-test-setup

View on GitHub

// Full documentation

# A/B Test Setup Skill

Trigger

Plan A/B tests with proper methodology — hypothesis, sample size, duration, variant design, statistical significance.

**Trigger phrases:** "A/B test", "split test", "experiment", "test this change", "variant", "multivariate test", "hypothesis"

Process

1. **Hypothesis**: What are you testing and why?

2. **Metrics**: Primary metric, guardrail metrics, success criteria

3. **Design**: Control vs variant(s), what exactly changes

4. **Calculate**: Sample size, test duration, minimum detectable effect

5. **Plan**: Implementation, QA, analysis timeline

Output Format

# A/B Test Plan: [Name]

## Hypothesis
If we [change], then [metric] will [improve/increase] because [reason].

## Variants
- **Control (A):** [current experience]
- **Variant (B):** [proposed change — be specific]

## Metrics
- **Primary:** [metric] — current: [X%] — target: [Y%]
- **Guardrail:** [metric that should NOT decrease]

## Sample Size & Duration
- MDE: [minimum detectable effect, e.g., 10% relative]
- Sample needed: [N per variant]
- Current traffic: [X visitors/day to test area]
- Estimated duration: [Y days/weeks]
- Confidence level: 95%

## Implementation Notes
[What needs to change, where, any technical considerations]

## Decision Framework
- If primary metric improves ≥ MDE with p < 0.05 → ship variant
- If no significant difference after [duration] → keep control
- If guardrail metric drops > [threshold] → stop test immediately

Rules

Never run a test without a hypothesis

One change per test (unless multivariate with sufficient traffic)

Run for minimum 2 full business cycles (usually 2 weeks)

Don't peek at results daily — pre-commit to evaluation date

95% confidence minimum. 80% power minimum.

Document everything: future you needs to know why this was tested

// Comments

// Related skills

More tools from the same signal band

Order food/drinks (点餐) on an Android device paired as an OpenClaw node. Uses in-app menu and cart; add goods, view cart, submit order (demo, no real payment).

Sign plugins, rotate agent credentials without losing identity, and publicly attest to plugin behavior with verifiable claims and authenticated transfers.

The philosophical layer for AI agents. Maps behavior to Spinoza's 48 affects, calculates persistence scores, and generates geometric self-reports. Give your...

日历管理数据处理

1 installs★ 0