Chinese NLP Toolkit
name: Chinese NLP Toolkit
by 371166758-qq · published 2026-04-01
$ claw add gh:371166758-qq/371166758-qq-chinese-nlp-toolkit---
name: Chinese NLP Toolkit
description: Specialized natural language processing for Chinese text. Covers segmentation (jiaba), sentiment analysis, keyword extraction, text summarization, tone detection, readability scoring, and format conversion (simplified/traditional, pinyin annotation). Use when processing, analyzing, or transforming Chinese text content.
---
# Chinese NLP Toolkit
Process and analyze Chinese text with specialized NLP capabilities.
Core Capabilities
1. Text Segmentation (分词)
Chinese has no word boundaries. Segmentation is the foundation of all Chinese NLP.
**Approach**: Use rule-based heuristics when no library is available:
**Common Ambiguities**:
| Text | Wrong Split | Correct Split |
|------|-------------|---------------|
| 雨伞 | 雨/伞 | 雨伞 (compound) |
| 结婚的和尚未结婚的 | 结婚/的/和尚/未/结婚/的 | 结婚/的/和/尚未/结婚/的 |
| 项目部 | 项目/部 | 项目部 (compound) |
2. Sentiment Analysis (情感分析)
Beyond positive/negative — Chinese sentiment is nuanced:
**Intensity levels**: 强烈负面 < 偏负面 < 中性 < 偏正面 < 强烈正面
**Chinese-specific signals**:
**Emoji contribution** (critical for social media):
3. Keyword Extraction (关键词提取)
For Chinese text, prioritize:
**Method**: TF-IDF adapted for Chinese + positional weighting (first/last sentences carry more weight in Chinese writing).
4. Text Summarization (文本摘要)
**Chinese-specific rules**:
5. Readability Scoring (可读性评分)
Rate Chinese text on a 1-10 scale considering:
| Score | Level | Target Audience |
|-------|-------|-----------------|
| 1-3 | Easy | General public |
| 4-6 | Moderate | Educated readers |
| 7-8 | Hard | Domain experts |
| 9-10 | Very Hard | Academic specialists |
6. Format Conversion
| Conversion | Example |
|---|---|
| Simplified → Traditional | 体验 → 體驗 |
| Traditional → Simplified | 體驗 → 体验 |
| Chinese → Pinyin | 你好 → nǐ hǎo |
| Chinese → Zhuyin | 你好 → ㄋㄧˇ ㄏㄠˇ |
Workflow
When Processing Chinese Text:
1. **Detect variant**: Simplified (简体) or Traditional (繁体)?
2. **Segment**: Break into meaningful units
3. **Analyze**: Apply the requested analysis type(s)
4. **Report**: Present results with Chinese annotations
Output Format
原文:[original text]
分词:[segmented text with / separators]
关键词:[top 5-10 keywords with relevance scores]
情感:[sentiment label + confidence + key signals]
摘要:[summarized text]
可读性:[score/10 + brief explanation]Edge Cases
Common Tasks & Prompts
More tools from the same signal band
Order food/drinks (点餐) on an Android device paired as an OpenClaw node. Uses in-app menu and cart; add goods, view cart, submit order (demo, no real payment).
Sign plugins, rotate agent credentials without losing identity, and publicly attest to plugin behavior with verifiable claims and authenticated transfers.
The philosophical layer for AI agents. Maps behavior to Spinoza's 48 affects, calculates persistence scores, and generates geometric self-reports. Give your...