⚡

// Skill profile

Rag Evaluator

Name: Rag Evaluator
Author: bytesagain1

version: "2.0.0"

by bytesagain1 · published 2026-03-22

数据处理API集成加密货币

Total installs

Stars

★ 0

Last updated

2026-03

// Install command

$ claw add gh:bytesagain1/bytesagain1-rag-evaluator

View on GitHub

// Full documentation

---

version: "2.0.0"

name: Ragaai Catalyst

description: "Python SDK for Agent AI Observability, Monitoring and Evaluation Framework. Includes features like a ragaai catalyst, python, agentic-ai."

---

# Rag Evaluator

AI-powered RAG (Retrieval-Augmented Generation) evaluation toolkit. Configure, benchmark, compare, and optimize your RAG pipelines from the command line. Track prompts, evaluations, fine-tuning experiments, costs, and usage — all with persistent local logging and full export capabilities.

Commands

Run `rag-evaluator <command> [args]` to use.

| Command | Description |

|---------|-------------|

| `configure` | Configure RAG evaluation settings and parameters |

| `benchmark` | Run benchmarks against your RAG pipeline |

| `compare` | Compare results across different RAG configurations |

| `prompt` | Log and manage prompt templates and variations |

| `evaluate` | Evaluate RAG output quality and relevance |

| `fine-tune` | Track fine-tuning experiments and parameters |

| `analyze` | Analyze evaluation results and identify patterns |

| `cost` | Track and log API/inference costs |

| `usage` | Monitor token usage and API call volumes |

| `optimize` | Log optimization strategies and results |

| `test` | Run test cases against RAG configurations |

| `report` | Generate evaluation reports |

| `stats` | Show summary statistics across all categories |

| `export <fmt>` | Export data in json, csv, or txt format |

| `search <term>` | Search across all logged entries |

| `recent` | Show recent activity from history log |

| `status` | Health check — version, data dir, disk usage |

| `help` | Show help and available commands |

| `version` | Show version (v2.0.0) |

Each domain command (configure, benchmark, compare, etc.) works in two modes:

**Without arguments**: displays the most recent 20 entries from that category

**With arguments**: logs the input with a timestamp and saves to the category log file

Data Storage

All data is stored locally in `~/.local/share/rag-evaluator/`:

Each command creates its own log file (e.g., `configure.log`, `benchmark.log`)

A unified `history.log` tracks all activity across commands

Entries are stored in `timestamp|value` pipe-delimited format

Export supports JSON, CSV, and plain text formats

Requirements

Bash 4+ with `set -euo pipefail` strict mode

Standard Unix utilities: `date`, `wc`, `du`, `tail`, `grep`, `sed`, `cat`

No external dependencies or API keys required

When to Use

1. **Evaluating RAG pipeline quality** — log evaluation scores, compare retrieval strategies, and track improvements over time

2. **Benchmarking different configurations** — run benchmarks across embedding models, chunk sizes, or retrieval methods and compare results side by side

3. **Tracking costs and usage** — monitor API costs and token usage across experiments to stay within budget

4. **Managing prompt engineering** — log prompt variations, test them against your pipeline, and analyze which templates perform best

5. **Generating reports for stakeholders** — export evaluation data as JSON/CSV for dashboards, or generate text reports summarizing RAG performance

Examples

# Configure a new evaluation run
rag-evaluator configure "model=gpt-4 chunks=512 overlap=50 top_k=5"

# Run a benchmark and log results
rag-evaluator benchmark "latency=230ms recall@5=0.82 precision@5=0.71"

# Compare two retrieval strategies
rag-evaluator compare "bm25 vs dense: bm25 recall=0.78, dense recall=0.85"

# Track evaluation scores
rag-evaluator evaluate "faithfulness=0.91 relevance=0.87 coherence=0.93"

# Log API cost for a run
rag-evaluator cost "run-042: $0.23 (1.2k tokens input, 800 tokens output)"

# View summary statistics
rag-evaluator stats

# Export all data as CSV
rag-evaluator export csv

# Search for specific entries
rag-evaluator search "gpt-4"

# Check recent activity
rag-evaluator recent

# Health check
rag-evaluator status

Output

All commands output to stdout. Redirect to a file if needed:

rag-evaluator report "weekly summary" > report.txt
rag-evaluator export json  # saves to ~/.local/share/rag-evaluator/export.json

Configuration

Set `DATA_DIR` by modifying the script, or use the default: `~/.local/share/rag-evaluator/`

---

// Comments

// Related skills

More tools from the same signal band

Order food/drinks (点餐) on an Android device paired as an OpenClaw node. Uses in-app menu and cart; add goods, view cart, submit order (demo, no real payment).

Sign plugins, rotate agent credentials without losing identity, and publicly attest to plugin behavior with verifiable claims and authenticated transfers.

The philosophical layer for AI agents. Maps behavior to Spinoza's 48 affects, calculates persistence scores, and generates geometric self-reports. Give your...

日历管理数据处理

1 installs★ 0