⚡

// Skill profile

Crawler

Name: Crawler
Author: bytesagain3

name: crawler

by bytesagain3 · published 2026-03-22

开发工具数据处理加密货币

Total installs

Stars

★ 0

Last updated

2026-03

// Install command

$ claw add gh:bytesagain3/bytesagain3-crawler

View on GitHub

// Full documentation

---

name: crawler

version: "2.0.0"

author: BytesAgain

homepage: https://bytesagain.com

source: https://github.com/bytesagain/ai-skills

license: MIT-0

tags: [crawler, tool, utility]

description: "Crawl websites, extract links, and scrape content with rate limiting. Use when scraping pages, extracting links, generating sitemaps."

---

# Crawler

Crawler v2.0.0 — a data toolkit for ingesting, transforming, querying, filtering, aggregating, and exporting structured data entries. Each command logs timestamped records locally, making Crawler ideal for tracking web scraping workflows, managing crawl data pipelines, and auditing data processing activity.

Commands

| Command | Description |

|---------|-------------|

| `crawler ingest <input>` | Ingest a data entry (or view recent ingests with no args) |

| `crawler transform <input>` | Log a transform operation (or view recent transforms) |

| `crawler query <input>` | Record a query (or view recent queries) |

| `crawler filter <input>` | Record a filter operation (or view recent filters) |

| `crawler aggregate <input>` | Record an aggregation (or view recent aggregations) |

| `crawler visualize <input>` | Log a visualization task (or view recent visualizations) |

| `crawler export <input>` | Log an export entry (or view recent exports) |

| `crawler sample <input>` | Record a sampling operation (or view recent samples) |

| `crawler schema <input>` | Log a schema definition (or view recent schemas) |

| `crawler validate <input>` | Record a validation check (or view recent validations) |

| `crawler pipeline <input>` | Log a pipeline step (or view recent pipeline entries) |

| `crawler profile <input>` | Record a profiling run (or view recent profiles) |

| `crawler stats` | Show summary statistics across all log files |

| `crawler search <term>` | Search all entries for a keyword (case-insensitive) |

| `crawler recent` | Show the 20 most recent activity entries |

| `crawler status` | Health check — version, entry count, disk usage, last activity |

| `crawler help` | Display all available commands |

| `crawler version` | Print version string |

Each data command (ingest, transform, query, filter, aggregate, visualize, export, sample, schema, validate, pipeline, profile) works identically:

**With arguments:** saves a timestamped entry to `~/.local/share/crawler/<command>.log` and logs to `history.log`

**Without arguments:** displays the 20 most recent entries from that command's log file

Data Storage

All data is stored locally in `~/.local/share/crawler/`:

| File | Contents |

|------|----------|

| `ingest.log` | Timestamped ingest records |

| `transform.log` | Transform operation records |

| `query.log` | Query records |

| `filter.log` | Filter operation records |

| `aggregate.log` | Aggregation records |

| `visualize.log` | Visualization task records |

| `export.log` | Export entry records |

| `sample.log` | Sampling records |

| `schema.log` | Schema definition records |

| `validate.log` | Validation check records |

| `pipeline.log` | Pipeline step records |

| `profile.log` | Profiling records |

| `history.log` | Unified activity log for all commands |

The `stats` command reads all `.log` files and reports line counts per file, total entries, data directory size, and the timestamp of the first recorded activity.

The `export` utility function (called internally via `_export`) can produce **JSON**, **CSV**, or **TXT** output files under the data directory.

Requirements

**Bash** (4.0+)

**coreutils** — `date`, `wc`, `du`, `head`, `tail`, `grep`, `basename`, `cat`

No external dependencies, API keys, or network access required

Works on Linux and macOS

When to Use

1. **Tracking web crawl data pipelines** — use `ingest` to log URLs or pages crawled, `transform` to record data cleaning steps, and `export` to track output generation

2. **Managing scraping workflows** — use `pipeline` to document each stage of a multi-step scraping pipeline, with `validate` to confirm data quality at each step

3. **Profiling and sampling crawl results** — use `profile` to log performance metrics of crawls and `sample` to record sampling decisions for large datasets

4. **Querying and filtering collected data** — use `query` and `filter` to log search criteria and filtering rules applied to crawled content

5. **Auditing and searching historical activity** — use `search <term>` to find specific entries across all logs, or `stats` for a high-level overview of all data categories

Examples

# Ingest a crawled URL
crawler ingest "https://example.com/products — 142 items found"

# View recent ingest entries
crawler ingest

# Log a transform step
crawler transform "strip HTML tags, extract product names and prices"

# Record a filtering rule
crawler filter "exclude pages with HTTP 404 status"

# Log an aggregation
crawler aggregate "total products by category across 3 domains"

# Validate data quality
crawler validate "all price fields are numeric and > 0"

# Search across all logs
crawler search "product"

# View summary statistics
crawler stats

# Check health status
crawler status

# View the 20 most recent activities
crawler recent

How It Works

Crawler uses a simple append-only log architecture. Every command writes a pipe-delimited record (`timestamp|value`) to its dedicated log file. The `history.log` file captures a unified timeline of all operations with the format `MM-DD HH:MM command: value`.

This design makes Crawler:

**Fast** — pure bash, no database overhead

**Transparent** — all data is human-readable plain text

**Portable** — works anywhere bash runs, no install needed

**Auditable** — every action is timestamped and traceable

---

// Comments

// Related skills

More tools from the same signal band

Order food/drinks (点餐) on an Android device paired as an OpenClaw node. Uses in-app menu and cart; add goods, view cart, submit order (demo, no real payment).

Sign plugins, rotate agent credentials without losing identity, and publicly attest to plugin behavior with verifiable claims and authenticated transfers.

The philosophical layer for AI agents. Maps behavior to Spinoza's 48 affects, calculates persistence scores, and generates geometric self-reports. Give your...

日历管理数据处理

1 installs★ 0