HomeBrowseUpload
← Back to registry
// Skill profile

Apify

name: apify

by bmestanov · published 2026-03-22

开发工具数据处理加密货币
Total installs
0
Stars
★ 0
Last updated
2026-03
// Install command
$ claw add gh:bmestanov/bmestanov-apify
View on GitHub
// Full documentation

---

name: apify

description: Run Apify Actors (web scrapers, crawlers, automation tools) and retrieve their results using the Apify REST API with curl. Use when the user wants to scrape a website, extract data from the web, run an Apify Actor, crawl pages, or get results from Apify datasets.

homepage: https://docs.apify.com/api/v2

metadata:

{

"openclaw":

{

"emoji": "🐝",

"primaryEnv": "APIFY_TOKEN",

"requires": { "anyBins": ["curl", "wget"], "env": ["APIFY_TOKEN"] },

},

}

---

# Apify

Run any of the 17,000+ Actors on [Apify Store](https://apify.com/store) and retrieve structured results via the REST API.

Full OpenAPI spec: [openapi.json](openapi.json)

Authentication

All requests need the `APIFY_TOKEN` env var. Use it as a Bearer token:

-H "Authorization: Bearer $APIFY_TOKEN"

Base URL: `https://api.apify.com`

Core workflow

1. Find the right Actor

Search the Apify Store by keyword:

curl -s "https://api.apify.com/v2/store?search=web+scraper&limit=5" \
  -H "Authorization: Bearer $APIFY_TOKEN" | jq '.data.items[] | {name: (.username + "/" + .name), title, description}'

Actors are identified by `username~name` (tilde) in API paths, e.g. `apify~web-scraper`.

2. Get Actor README and input schema

Before running an Actor, fetch its default build to get the README (usage docs) and input schema (expected JSON fields):

curl -s "https://api.apify.com/v2/acts/apify~web-scraper/builds/default" \
  -H "Authorization: Bearer $APIFY_TOKEN" | jq '.data | {readme, inputSchema}'

`inputSchema` is a JSON-stringified object — parse it to see required/optional fields, types, defaults, and descriptions. Use this to construct valid input for the run.

You can also get the Actor's per-build OpenAPI spec (no auth required):

curl -s "https://api.apify.com/v2/acts/apify~web-scraper/builds/default/openapi.json"

3. Run an Actor (async — recommended for most cases)

Start the Actor and get the run object back immediately:

curl -s -X POST "https://api.apify.com/v2/acts/apify~web-scraper/runs" \
  -H "Authorization: Bearer $APIFY_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"startUrls":[{"url":"https://example.com"}],"maxPagesPerCrawl":10}'

Response includes `data.id` (run ID), `data.defaultDatasetId`, `data.status`.

Optional query params: `?timeout=300&memory=4096&maxItems=100&waitForFinish=60`

  • `waitForFinish` (0-60): seconds the API waits before returning. Useful to avoid polling for short runs.
  • 4. Poll run status

    curl -s "https://api.apify.com/v2/actor-runs/RUN_ID?waitForFinish=60" \
      -H "Authorization: Bearer $APIFY_TOKEN" | jq '.data | {status, defaultDatasetId}'

    Terminal statuses: `SUCCEEDED`, `FAILED`, `ABORTED`, `TIMED-OUT`.

    5. Get results

    **Dataset items** (most common — structured scraped data):

    curl -s "https://api.apify.com/v2/datasets/DATASET_ID/items?clean=true&limit=100" \
      -H "Authorization: Bearer $APIFY_TOKEN"

    Or directly from the run (shortcut — same parameters):

    curl -s "https://api.apify.com/v2/actor-runs/RUN_ID/dataset/items?clean=true&limit=100" \
      -H "Authorization: Bearer $APIFY_TOKEN"

    Params: `format` (`json`|`csv`|`jsonl`|`xml`|`xlsx`|`rss`), `fields`, `omit`, `limit`, `offset`, `clean`, `desc`.

    **Key-value store record** (screenshots, HTML, OUTPUT):

    curl -s "https://api.apify.com/v2/key-value-stores/STORE_ID/records/OUTPUT" \
      -H "Authorization: Bearer $APIFY_TOKEN"

    **Run log:**

    curl -s "https://api.apify.com/v2/logs/RUN_ID" \
      -H "Authorization: Bearer $APIFY_TOKEN"

    6. Run Actor synchronously (short-running Actors only)

    For Actors that finish within 300 seconds, get dataset items in one call:

    curl -s -X POST "https://api.apify.com/v2/acts/apify~web-scraper/run-sync-get-dataset-items?timeout=120" \
      -H "Authorization: Bearer $APIFY_TOKEN" \
      -H "Content-Type: application/json" \
      -d '{"startUrls":[{"url":"https://example.com"}],"maxPagesPerCrawl":5}'

    Returns the dataset items array directly (not wrapped in `data`). Returns `408` if the run exceeds 300s.

    Alternative: `/run-sync` returns the KVS `OUTPUT` record instead of dataset items.

    Quick recipes

    Scrape a website

    curl -s -X POST "https://api.apify.com/v2/acts/apify~web-scraper/run-sync-get-dataset-items?timeout=120" \
      -H "Authorization: Bearer $APIFY_TOKEN" \
      -H "Content-Type: application/json" \
      -d '{"startUrls":[{"url":"https://example.com"}],"maxPagesPerCrawl":20}'

    Google search

    curl -s -X POST "https://api.apify.com/v2/acts/apify~google-search-scraper/run-sync-get-dataset-items?timeout=120" \
      -H "Authorization: Bearer $APIFY_TOKEN" \
      -H "Content-Type: application/json" \
      -d '{"queries":"site:example.com openai","maxPagesPerQuery":1}'

    Long-running Actor (async with polling)

    # 1. Start
    RUN=$(curl -s -X POST "https://api.apify.com/v2/acts/apify~web-scraper/runs?waitForFinish=60" \
      -H "Authorization: Bearer $APIFY_TOKEN" \
      -H "Content-Type: application/json" \
      -d '{"startUrls":[{"url":"https://example.com"}],"maxPagesPerCrawl":500}')
    RUN_ID=$(echo "$RUN" | jq -r '.data.id')
    
    # 2. Poll until done
    while true; do
      STATUS=$(curl -s "https://api.apify.com/v2/actor-runs/$RUN_ID?waitForFinish=60" \
        -H "Authorization: Bearer $APIFY_TOKEN" | jq -r '.data.status')
      echo "Status: $STATUS"
      case "$STATUS" in SUCCEEDED|FAILED|ABORTED|TIMED-OUT) break;; esac
    done
    
    # 3. Fetch results
    curl -s "https://api.apify.com/v2/actor-runs/$RUN_ID/dataset/items?clean=true" \
      -H "Authorization: Bearer $APIFY_TOKEN"

    Abort a run

    curl -s -X POST "https://api.apify.com/v2/actor-runs/RUN_ID/abort" \
      -H "Authorization: Bearer $APIFY_TOKEN"

    Paid / rental Actors

    Some Actors require a monthly subscription before they can be run. If the API returns a permissions or payment error for an Actor, ask the user to manually subscribe via the Apify Console:

    https://console.apify.com/actors/ACTOR_ID

    Replace `ACTOR_ID` with the Actor's ID (e.g. `AhEsMsQyLfHyMLaxz`). The user needs to click **Start** on that page to activate the subscription. Most rental Actors offer a free trial period set by the developer.

    You can get the Actor ID from the store search response (`data.items[].id`) or from `GET /v2/acts/username~name` (`data.id`).

    Error handling

  • **401**: `APIFY_TOKEN` missing or invalid.
  • **404 Actor not found**: check `username~name` format (tilde, not slash). Browse https://apify.com/store.
  • **400 run-failed**: check `GET /v2/logs/RUN_ID` for details.
  • **402/403 payment required**: the Actor likely requires a subscription. See "Paid / rental Actors" above.
  • **408 run-timeout-exceeded**: sync endpoints have a 300s limit. Use async workflow instead.
  • **429 rate-limit-exceeded**: retry with exponential backoff (start at 500ms, double each time).
  • Additional resources

  • API docs (LLM-friendly): https://docs.apify.com/api/v2.md
  • OpenAPI spec: [openapi.json](openapi.json)
  • Apify Store (browse Actors): https://apify.com/store
  • // Comments
    Sign in with GitHub to leave a comment.
    // Related skills

    More tools from the same signal band