⚡

// Skill profile

AWS ECS Monitor

Name: AWS ECS Monitor
Author: briancolinger

name: aws-ecs-monitor

by briancolinger · published 2026-03-22

数据处理API集成加密货币

Total installs

Stars

★ 0

Last updated

2026-03

// Install command

$ claw add gh:briancolinger/briancolinger-aws-ecs-monitor

View on GitHub

// Full documentation

---

name: aws-ecs-monitor

version: 1.0.1

description: AWS ECS production health monitoring with CloudWatch log analysis — monitors ECS service health, ALB targets, SSL certificates, and provides deep CloudWatch log analysis for error categorization, restart detection, and production alerts.

metadata:

openclaw:

requires:

bins: ["aws", "curl", "python3"]

anyBins: ["openssl"]

---

# AWS ECS Monitor

Production health monitoring and log analysis for AWS ECS services.

What It Does

**Health Checks**: HTTP probes against your domain, ECS service status (desired vs running), ALB target group health, SSL certificate expiry

**Log Analysis**: Pulls CloudWatch logs, categorizes errors (panics, fatals, OOM, timeouts, 5xx), detects container restarts, filters health check noise

**Auto-Diagnosis**: Reads health status and automatically investigates failing services via log analysis

Prerequisites

`aws` CLI configured with appropriate IAM permissions:

- `ecs:ListServices`, `ecs:DescribeServices`

- `elasticloadbalancing:DescribeTargetGroups`, `elasticloadbalancing:DescribeTargetHealth`

- `logs:FilterLogEvents`, `logs:DescribeLogGroups`

`curl` for HTTP health checks

`python3` for JSON processing and log analysis

`openssl` for SSL certificate checks (optional)

Configuration

All configuration is via environment variables:

|---|---|---|---|

| `ECS_CLUSTER` | **Yes** | — | ECS cluster name |

| `ECS_DOMAIN` | No | — | Domain for HTTP/SSL checks (skip if unset) |

| `ECS_LOG_PATTERN` | No | `/ecs/{service}` | CloudWatch log group pattern (`{service}` is replaced) |

| `ECS_HTTP_ENDPOINTS` | No | — | Comma-separated `name=url` pairs for HTTP probes |

Directories Written

**`ECS_HEALTH_STATE`** (default: `./data/ecs-health.json`) — Health state JSON file

**`ECS_HEALTH_OUTDIR`** (default: `./data/`) — Output directory for logs, alerts, and analysis reports

Scripts

`scripts/ecs-health.sh` — Health Monitor

# Full check
ECS_CLUSTER=my-cluster ECS_DOMAIN=example.com ./scripts/ecs-health.sh

# JSON output only
ECS_CLUSTER=my-cluster ./scripts/ecs-health.sh --json

# Quiet mode (no alerts, just status file)
ECS_CLUSTER=my-cluster ./scripts/ecs-health.sh --quiet

Exit codes: `0` = healthy, `1` = unhealthy/degraded, `2` = script error

`scripts/cloudwatch-logs.sh` — Log Analyzer

# Pull raw logs from a service
ECS_CLUSTER=my-cluster ./scripts/cloudwatch-logs.sh pull my-api --minutes 30

# Show errors across all services
ECS_CLUSTER=my-cluster ./scripts/cloudwatch-logs.sh errors all --minutes 120

# Deep analysis with error categorization
ECS_CLUSTER=my-cluster ./scripts/cloudwatch-logs.sh diagnose --minutes 60

# Detect container restarts
ECS_CLUSTER=my-cluster ./scripts/cloudwatch-logs.sh restarts my-api

# Auto-diagnose from health state file
ECS_CLUSTER=my-cluster ./scripts/cloudwatch-logs.sh auto-diagnose

# Summary across all services
ECS_CLUSTER=my-cluster ./scripts/cloudwatch-logs.sh summary --minutes 120

Options: `--minutes N` (default: 60), `--json`, `--limit N` (default: 200), `--verbose`

Auto-Detection

When `ECS_SERVICES` is not set, both scripts auto-detect services from the cluster:

aws ecs list-services --cluster $ECS_CLUSTER

Log groups are resolved by pattern (default `/ecs/{service}`). Override with `ECS_LOG_PATTERN`:

# If your log groups are /ecs/prod/my-api, /ecs/prod/my-frontend, etc.
ECS_LOG_PATTERN="/ecs/prod/{service}" ECS_CLUSTER=my-cluster ./scripts/cloudwatch-logs.sh diagnose

Integration

The health monitor can trigger the log analyzer for auto-diagnosis when issues are detected. Set `ECS_HEALTH_OUTDIR` to a shared directory and run both scripts together:

export ECS_CLUSTER=my-cluster
export ECS_DOMAIN=example.com
export ECS_HEALTH_OUTDIR=./data

# Run health check (auto-triggers log analysis on failure)
./scripts/ecs-health.sh

# Or run log analysis independently
./scripts/cloudwatch-logs.sh auto-diagnose --minutes 30

Error Categories

The log analyzer classifies errors into:

`panic` — Go panics

`fatal` — Fatal errors

`oom` — Out of memory

`timeout` — Connection/request timeouts

`connection_error` — Connection refused/reset

`http_5xx` — HTTP 500-level responses

`python_traceback` — Python tracebacks

`exception` — Generic exceptions

`auth_error` — Permission/authorization failures

`structured_error` — JSON-structured error logs

`error` — Generic ERROR-level messages

Health check noise (GET/HEAD `/health` from ALB) is automatically filtered from error counts and HTTP status distribution.

// Comments

// Related skills

More tools from the same signal band

Order food/drinks (点餐) on an Android device paired as an OpenClaw node. Uses in-app menu and cart; add goods, view cart, submit order (demo, no real payment).

Sign plugins, rotate agent credentials without losing identity, and publicly attest to plugin behavior with verifiable claims and authenticated transfers.

The philosophical layer for AI agents. Maps behavior to Spinoza's 48 affects, calculates persistence scores, and generates geometric self-reports. Give your...

日历管理数据处理

1 installs★ 0