TikTok Profile Scraper
A browser-based TikTok profile discovery and scraping tool.
by arulmozhiv · published 2026-04-01
$ claw add gh:arulmozhiv/arulmozhiv-tiktok-scraper-2# TikTok Profile Scraper
A browser-based TikTok profile discovery and scraping tool.
> Part of **[ScrapeClaw](https://www.scrapeclaw.cc/)** — a suite of production-ready, agentic social media scrapers for Instagram, YouTube, X/Twitter, TikTok, and Facebook built with Python & Playwright, no API keys required.
---
name: tiktok-scraper
description: Discover and scrape TikTok profiles from your browser.
emoji: 🎵
version: 1.0.0
author: influenza
tags:
- tiktok
- scraping
- social-media
- influencer-discovery
metadata:
clawdbot:
requires:
bins:
- python3
- chromium
config:
stateDirs:
- data/output
- data/queue
- thumbnails
outputFormats:
- json
- csv
---Overview
This skill provides a two-phase TikTok scraping system:
1. **Profile Discovery**
2. **Browser Scraping**
Features
#### Getting Google API Credentials (Optional)
1. Go to [Google Cloud Console](https://console.cloud.google.com/)
2. Create a new project or select existing
3. Enable "Custom Search API"
4. Create API credentials → API Key
5. Go to [Programmable Search Engine](https://programmablesearchengine.google.com/)
6. Create a search engine with `tiktok.com` as the site to search
7. Copy the Search Engine ID
Usage
Agent Tool Interface
For OpenClaw agent integration, the skill provides JSON output:
# Discover profiles (returns JSON)
discover --location "Miami" --category "dance" --output json
# Scrape single profile (returns JSON)
scrape --username charlidamelio --output jsonOutput Data
Profile Data Structure
{
"username": "example_creator",
"full_name": "Example Creator",
"nickname": "Example",
"bio": "Dance creator | NYC 💃",
"bio_link": "https://example.com",
"followers": 250000,
"following": 800,
"likes": 5000000,
"videos_count": 120,
"is_verified": false,
"is_private": false,
"influencer_tier": "macro",
"category": "dance",
"location": "New York",
"profile_url": "https://www.tiktok.com/@example_creator",
"profile_pic_local": "thumbnails/example_creator/profile_abc123.jpg",
"content_thumbnails": [
"thumbnails/example_creator/content_1_def456.jpg",
"thumbnails/example_creator/content_2_ghi789.jpg"
],
"video_views": [
{"display": "1.2M", "count": 1200000},
{"display": "500K", "count": 500000}
],
"scrape_timestamp": "2026-03-02T14:30:00"
}Influencer Tiers
| Tier | Follower Range |
|-------|-------------------|
| nano | < 1,000 |
| micro | 1,000 - 10,000 |
| mid | 10,000 - 100,000 |
| macro | 100,000 - 1M |
| mega | > 1,000,000 |
File Outputs
Configuration
Edit `config/scraper_config.json`:
{
"proxy": {
"enabled": false,
"provider": "brightdata",
"country": "",
"sticky": true,
"sticky_ttl_minutes": 10
},
"google_search": {
"enabled": true,
"api_key": "",
"search_engine_id": "",
"queries_per_location": 3
},
"scraper": {
"headless": false,
"min_followers": 1000,
"download_thumbnails": true,
"max_thumbnails": 6
},
"cities": ["New York", "Los Angeles", "Miami", "Chicago"],
"categories": ["fashion", "beauty", "fitness", "food", "travel", "tech", "comedy", "dance", "music", "gaming"]
}Filters Applied
The scraper automatically filters out:
Troubleshooting
No Profiles Discovered
Rate Limiting
CAPTCHA / Bot Detection
---
🌐 Residential Proxy Support
Why Use a Residential Proxy?
Running a scraper at scale **without** a residential proxy will get your IP blocked fast. Here's why proxies are essential for long-running scrapes:
| Advantage | Description |
|-----------|-------------|
| **Avoid IP Bans** | Residential IPs look like real household users, not data-center bots. TikTok is far less likely to flag them. |
| **Automatic IP Rotation** | Each request (or session) gets a fresh IP, so rate-limits never stack up on one address. |
| **Geo-Targeting** | Route traffic through a specific country/city so scraped content matches the target audience's locale. |
| **Sticky Sessions** | Keep the same IP for a configurable window (e.g. 10 min) — critical for maintaining a consistent browsing session. |
| **Higher Success Rate** | Rotating residential IPs deliver 95%+ success rates compared to ~30% with data-center proxies on TikTok. |
| **Long-Running Scrapes** | Scrape thousands of profiles over hours or days without interruption. |
| **Concurrent Scraping** | Run multiple browser instances across different IPs simultaneously. |
Recommended Proxy Providers
We have affiliate partnerships with top residential proxy providers. Using these links supports continued development of this skill:
| Provider | Best For | Sign Up |
|----------|----------|---------|
| **Bright Data** | World's largest network, 72M+ IPs, enterprise-grade | 👉 [**Get Bright Data**](https://get.brightdata.com/o1kpd2da8iv4) |
| **IProyal** | Pay-as-you-go, 195+ countries, no traffic expiry | 👉 [**Get IProyal**](https://iproyal.com/?r=ScrapeClaw) |
| **Storm Proxies** | Fast & reliable, developer-friendly API, competitive pricing | 👉 [**Get Storm Proxies**](https://stormproxies.com/clients/aff/go/scrapeclaw) |
| **NetNut** | ISP-grade network, 52M+ IPs, direct connectivity | 👉 [**Get NetNut**](https://netnut.io?ref=mwrlzwv) |
Setup Steps
#### 1. Get Your Proxy Credentials
Sign up with any provider above, then grab:
#### 2. Configure via Environment Variables
export PROXY_ENABLED=true
export PROXY_PROVIDER=brightdata # brightdata | iproyal | stormproxies | netnut | custom
export PROXY_USERNAME=your_user
export PROXY_PASSWORD=your_pass
export PROXY_COUNTRY=us # optional: two-letter country code
export PROXY_STICKY=true # optional: keep same IP per session#### 3. Provider-Specific Host/Port Defaults
These are auto-configured when you set the `provider` name:
| Provider | Host | Port |
|----------|------|------|
| Bright Data | `brd.superproxy.io` | `22225` |
| IProyal | `proxy.iproyal.com` | `12321` |
| Storm Proxies | `rotating.stormproxies.com` | `9999` |
| NetNut | `gw-resi.netnut.io` | `5959` |
Override with `PROXY_HOST` / `PROXY_PORT` env vars if your plan uses a different gateway.
#### 4. Custom Proxy Provider
For any other proxy service, set provider to `custom` and supply host/port manually:
{
"proxy": {
"enabled": true,
"provider": "custom",
"host": "your.proxy.host",
"port": 8080,
"username": "user",
"password": "pass"
}
}Running the Scraper with Proxy
Once configured, the scraper picks up the proxy automatically — no extra flags needed:
# Discover and scrape as usual — proxy is applied automatically
python main.py discover --location "Miami" --category "dance"
python main.py scrape --username charlidamelio
# The log will confirm proxy is active:
# INFO - Proxy enabled: <ProxyManager provider=brightdata enabled host=brd.superproxy.io:22225>Using the Proxy Manager Programmatically
from proxy_manager import ProxyManager
# From config (auto-reads config/scraper_config.json)
pm = ProxyManager.from_config()
# From environment variables
pm = ProxyManager.from_env()
# Manual construction
pm = ProxyManager(
provider="brightdata",
username="your_user",
password="your_pass",
country="us",
sticky=True
)
# For Playwright browser context
proxy = pm.get_playwright_proxy()
# → {"server": "http://brd.superproxy.io:22225", "username": "user-country-us-session-abc123", "password": "pass"}
# For requests / aiohttp
proxies = pm.get_requests_proxy()
# → {"http": "http://user:pass@host:port", "https": "http://user:pass@host:port"}
# Force new IP (rotates session ID)
pm.rotate_session()
# Debug info
print(pm.info())Best Practices for Long-Running Scrapes
1. **Use sticky sessions** — TikTok requires consistent IPs during a browsing session. Set `"sticky": true`.
2. **Target the right country** — Set `"country": "us"` (or your target region) so TikTok serves content in the expected locale.
3. **Combine with existing anti-detection** — This scraper already has fingerprinting, stealth scripts, and human behavior simulation. The proxy is the final layer.
4. **Rotate sessions between batches** — Call `pm.rotate_session()` between large batches of profiles to get a fresh IP.
5. **Use delays** — Even with proxies, respect `delay_between_profiles` in config to avoid aggressive patterns.
6. **Monitor your proxy dashboard** — All providers have dashboards showing bandwidth usage and success rates.
More tools from the same signal band
Order food/drinks (点餐) on an Android device paired as an OpenClaw node. Uses in-app menu and cart; add goods, view cart, submit order (demo, no real payment).
Sign plugins, rotate agent credentials without losing identity, and publicly attest to plugin behavior with verifiable claims and authenticated transfers.
The philosophical layer for AI agents. Maps behavior to Spinoza's 48 affects, calculates persistence scores, and generates geometric self-reports. Give your...