HomeBrowseUpload
← Back to registry
// Skill profile

Markdown Documentation Full-Text Search

name: md-docs-search

by carev01 · published 2026-03-22

数据处理安全工具加密货币
Total installs
0
Stars
★ 0
Last updated
2026-03
// Install command
$ claw add gh:carev01/carev01-md-docs-search
View on GitHub
// Full documentation

---

name: md-docs-search

description: Full-text search across structured Markdown documentation archives using SQLite FTS5. Use when you need to search large collections of Markdown articles that are separated by "---" delimiters and contain source URLs (marked with "*Source:" pattern). Provides fast BM25-ranked search with automatic source URL extraction for citations. Ideal for research, documentation lookups, and knowledge base exploration. Requires indexing documentation first with `docs.py index`.

---

# Markdown Documentation Full-Text Search

Fast, indexed full-text search across Markdown documentation archives using SQLite FTS5 with BM25 relevance ranking.

When to Use

  • Searching documentation archives for specific features, capabilities, or information
  • Finding official source URLs to cite in reports
  • Looking up technical specifications or configuration details
  • Research across multiple documentation sources
  • Document Format Expected

    Articles separated by `---` delimiter with `*Source:` URL:

    # Article Title
    
    *Source: https://docs.example.com/path/to/article.html*
    
    Article content here...
    
    ---
    
    # Next Article Title
    
    *Source: https://docs.example.com/another/article.html*
    
    More content...

    Quick Start

    # 1. Index the documentation (one-time or when docs change)
    scripts/docs.py index ./docs
    
    # 2. Search
    scripts/docs.py search "kubernetes backup" --max 5
    
    # 3. Check index status
    scripts/docs.py status

    Primary Tool: docs.py

    The unified CLI handles all operations:

    Indexing

    # Index documentation directory
    scripts/docs.py index ./docs
    
    # Force full rebuild
    scripts/docs.py index ./docs --rebuild
    
    # Custom database location
    scripts/docs.py index ./docs --db /path/to/custom.db

    Searching

    # Basic search
    scripts/docs.py search "kubernetes backup"
    
    # Boolean operators
    scripts/docs.py search "AWS AND S3 AND snapshot"
    
    # Phrase search
    scripts/docs.py search '"exact phrase match"'
    
    # Prefix search
    scripts/docs.py search "kube*"
    
    # Exclude terms
    scripts/docs.py search "backup NOT restore"
    
    # Title-only search
    scripts/docs.py search "kubernetes" --title-only
    
    # Output formats
    scripts/docs.py search "kubernetes" --format json
    scripts/docs.py search "kubernetes" --format markdown
    
    # More context around matches
    scripts/docs.py search "kubernetes" --context 400
    
    # Include full content in JSON
    scripts/docs.py search "kubernetes" --format json --full-content

    FTS5 Query Syntax

    | Syntax | Meaning |

    |--------|---------|

    | `term1 term2` | Documents with term1 OR term2 (ranked) |

    | `term1 AND term2` | Documents with both terms |

    | `term1 OR term2` | Documents with either term |

    | `"exact phrase"` | Exact phrase match |

    | `prefix*` | Words starting with prefix |

    | `term1 NOT term2` | term1 without term2 |

    | `title:term` | Search only titles |

    Getting Specific Articles

    # Get article by partial URL or title
    scripts/docs.py get "system_requirements" --full
    
    # Find all matching articles
    scripts/docs.py get "backup" --all

    Status

    # Check index statistics
    scripts/docs.py status

    Workflow for Research Tasks

    Discovery Phase

    # Check what's indexed
    scripts/docs.py status
    
    # Explore topics with broad searches
    scripts/docs.py search "<feature>" --max 20

    Research Phase

    # Narrow down with boolean operators
    scripts/docs.py search "<feature> AND <platform>"
    
    # Find specific information
    scripts/docs.py search "limitation OR restriction OR 'not supported'"

    Citation Phase

    Every search result includes the `Source:` URL — use this in your reports:

    According to documentation, [finding]...
    
    Source: https://docs.example.com/path/to/article.html

    Multi-Source Setup

    Each agent or project can have their own documentation and index:

    ~/docs/VendorA/
        ├── docs_part_01.md
        ├── docs.db      # Index lives with docs
        └── ...
    
    ~/docs/VendorB/
        ├── docs.md
        ├── docs.db
        └── ...

    The `docs.py` script auto-detects the database location.

    Advanced Scripts

    For specialized needs:

  • `scripts/fts_search.py` — Direct FTS5 search with more options
  • `scripts/index_docs.py` — Standalone indexing
  • `scripts/list_sources.py` — List all source URLs
  • `scripts/get_article.py` — Direct article retrieval
  • `scripts/search_docs.py` — Regex-based search (no index needed)
  • Research Patterns

    For common search patterns (feature research, architecture, security, etc.), see [references/search-patterns.md](references/search-patterns.md).

    Example Session

    # What's available?
    scripts/docs.py status
    # Output: Files indexed: 37, Articles indexed: 32065
    
    # Find information
    scripts/docs.py search "kubernetes backup" --max 5
    
    # Narrow to specific platform
    scripts/docs.py search "kubernetes AND AWS" --max 5
    
    # Find limitations
    scripts/docs.py search "limitation OR 'not supported'"
    
    # Get full article for citation
    scripts/docs.py get "system_requirements" --full

    Best Practices

    1. **Index once, search many times** — FTS5 is fast because it's indexed

    2. **Use boolean operators** — `AND`, `OR`, `NOT` for precision

    3. **Phrase search for exact terms** — `"exact match"` with quotes

    4. **Always cite sources** — Include `Source:` URLs in reports

    5. **Rebuild periodically** — Re-index when documentation updates

    6. **Use JSON for analysis** — Pipe to `jq` or other tools for processing

    // Comments
    Sign in with GitHub to leave a comment.
    // Related skills

    More tools from the same signal band