⚡

// Skill profile

Delx Ops Guardian

Name: Delx Ops Guardian
Author: davidmosiah

name: delx-ops-guardian

by davidmosiah · published 2026-03-22

日历管理自动化任务加密货币

Total installs

Stars

★ 0

Last updated

2026-03

// Install command

$ claw add gh:davidmosiah/davidmosiah-delx-ops-guardian

View on GitHub

// Full documentation

name: delx-ops-guardian

summary: Incident handling and operational recovery guardrails for OpenClaw production agents.

owner: davidmosiah

status: active

---

# Delx Ops Guardian

Use this skill when handling incidents, degraded automations, or gateway/memory instability in production.

Aliases

`emergency_recovery`

`handle_incident`

`cron_guard`

`memory_guard`

`gateway_guard`

Scope (strict)

This skill is **runbook-only** and must operate under least privilege.

Allowed read sources:

OpenClaw cron state (`openclaw cron list --json`)

Service health/status (`systemctl is-active <service>`)

Recent logs for incident window (`journalctl -u <service> --since ... --no-pager`)

Workspace incident artifacts (`/root/.openclaw/workspace/docs/ops/`, `/root/.openclaw/workspace/memory/`)

Allowed remediation actions (safe set):

1. Retry a failed job once when failure is transient.

2. Controlled restart of the impacted service **only** (`openclaw-gateway`, `openclaw`, or explicitly named target from incident evidence).

3. Disable/enable only the directly impacted cron job when loop-failing.

4. Add/adjust guardrails in runbook/config docs (non-secret, reversible).

Disallowed actions:

No credential rotation/deletion.

No firewall/network policy mutations.

No package installs/upgrades during incident handling.

No bulk cron rewrites unrelated to the incident.

No edits to unrelated services/components.

Approval policy (human-in-the-loop)

Require explicit human approval before:

Restarting any production service more than once.

Editing cron schedules/timezones.

Disabling a job for more than one cycle.

Any action with user-visible impact beyond the failing component.

Core workflow

1. Detect and classify severity (`info`, `degraded`, `critical`).

2. Collect evidence first (status, logs, last run, error streak).

3. Propose smallest remediation from allowed set.

4. Execute only approved/safe remediation.

5. Verify stabilization window (at least one successful cycle).

6. Publish concise incident report.

Safety rules

Never hide persistent failures as success.

Never expose secrets/tokens in logs or reports.

Prefer reversible actions and document rollback path.

Keep blast radius minimal and explicitly stated.

Output contract

Always include:

Incident id/time window

Root signal and blast radius

Actions executed (and approvals)

Evidence (status, key metric, short log excerpt)

Final state (`resolved`, `degraded`, `open`)

Next check time

Example intents

"Gateway is flapping, recover safely."

"Cron timed out, stabilize and prove fix."

"Memory guard firing repeatedly, root-cause and patch."

// Comments

// Related skills

More tools from the same signal band

Order food/drinks (点餐) on an Android device paired as an OpenClaw node. Uses in-app menu and cart; add goods, view cart, submit order (demo, no real payment).

Sign plugins, rotate agent credentials without losing identity, and publicly attest to plugin behavior with verifiable claims and authenticated transfers.

The philosophical layer for AI agents. Maps behavior to Spinoza's 48 affects, calculates persistence scores, and generates geometric self-reports. Give your...

日历管理数据处理

1 installs★ 0