HomeBrowseUpload
← Back to registry
// Skill profile

Agent Learner

version: "2.0.0"

by bytesagain · published 2026-03-22

开发工具数据处理
Total installs
0
Stars
★ 0
Last updated
2026-03
// Install command
$ claw add gh:bytesagain/bytesagain-ba-agent-learner
View on GitHub
// Full documentation

---

version: "2.0.0"

name: agent-learner

description: "Benchmark and compare agent prompts and evaluation results. Use when tuning strategies, evaluating outputs, or comparing configurations."

author: BytesAgain

homepage: https://bytesagain.com

source: https://github.com/bytesagain/ai-skills

---

# Agent Learner

An AI toolkit for configuring, benchmarking, comparing, and optimizing agent prompts and evaluation results. Agent Learner provides persistent, file-based logging for each command category with timestamped entries, summary statistics, multi-format export, and full-text search across all records.

Commands

| Command | Description |

|---------|-------------|

| `configure` | Configure agent settings — log configuration entries or view recent ones |

| `benchmark` | Benchmark agent performance — log benchmark results or view history |

| `compare` | Compare agent outputs — log comparison data or view recent comparisons |

| `prompt` | Prompt management — log prompt variations or view recent prompts |

| `evaluate` | Evaluate agent outputs — log evaluation results or view history |

| `fine-tune` | Fine-tune parameters — log fine-tuning sessions or view recent ones |

| `analyze` | Analyze agent behavior — log analysis entries or view recent analyses |

| `cost` | Cost tracking — log cost data or view recent cost entries |

| `usage` | Usage monitoring — log usage metrics or view recent usage data |

| `optimize` | Optimize configurations — log optimization runs or view history |

| `test` | Test agent behavior — log test results or view recent tests |

| `report` | Report generation — log report entries or view recent reports |

| `stats` | Show summary statistics across all log categories (entry counts, data size, first entry date) |

| `export <fmt>` | Export all data in json, csv, or txt format to the data directory |

| `search <term>` | Full-text search across all log files (case-insensitive) |

| `recent` | Show the 20 most recent entries from the activity history log |

| `status` | Health check — show version, data directory, total entries, disk usage, and last activity |

| `help` | Show the full help message with all available commands |

| `version` | Print the current version string |

Each data command (configure, benchmark, compare, etc.) works in two modes:

  • **Without arguments**: displays the 20 most recent entries from that category
  • **With arguments**: saves the input as a new timestamped entry and reports the total count
  • Data Storage

    All data is stored in plain text files under the data directory:

  • **Category logs**: `$DATA_DIR/<command>.log` — one file per command (e.g., `configure.log`, `benchmark.log`, `prompt.log`), each entry is `timestamp|value`
  • **History log**: `$DATA_DIR/history.log` — audit trail of every command executed with timestamps
  • **Export files**: `$DATA_DIR/export.<fmt>` — generated by the `export` command in json, csv, or txt format
  • Default data directory: `~/.local/share/agent-learner/`

    Requirements

  • Bash (with `set -euo pipefail` support)
  • Standard Unix utilities: `grep`, `cat`, `date`, `echo`, `wc`, `du`, `head`, `tail`, `basename`
  • No external dependencies or API keys required
  • When to Use

    1. **Benchmarking agent performance** — When you need to track and compare benchmark results across different agent configurations, models, or prompt strategies

    2. **Prompt engineering iteration** — When you're testing multiple prompt variations and want to log each version with results for later comparison

    3. **Cost and usage tracking** — When you need to monitor API costs and usage metrics over time to optimize spending

    4. **Fine-tuning experiments** — When running fine-tuning sessions and you want to log parameters, results, and observations for reproducibility

    5. **Cross-category analysis** — When you need to search across all logged data (benchmarks, prompts, evaluations, costs) to find patterns or specific entries

    Examples

    # Initialize and check status
    agent-learner status
    
    # Log a benchmark result
    agent-learner benchmark "GPT-4o on MMLU: 88.7% accuracy, 1.2s avg latency"
    
    # Log a prompt variation
    agent-learner prompt "System: You are a helpful coding assistant. Always explain your reasoning step by step."
    
    # Compare two configurations
    agent-learner compare "GPT-4o vs Claude-3.5: GPT-4o 12% faster, Claude 5% more accurate on code tasks"
    
    # Track costs
    agent-learner cost "March batch: 12,450 tokens input, 3,200 tokens output, $0.47 total"
    
    # View all recent benchmarks
    agent-learner benchmark
    
    # Search across all logs for a specific term
    agent-learner search "accuracy"
    
    # Export all data as JSON
    agent-learner export json
    
    # View summary statistics
    agent-learner stats
    
    # Show recent activity
    agent-learner recent

    Output

    All commands return output to stdout. Export files are written to the data directory:

    agent-learner export json   # → ~/.local/share/agent-learner/export.json
    agent-learner export csv    # → ~/.local/share/agent-learner/export.csv
    agent-learner export txt    # → ~/.local/share/agent-learner/export.txt

    Every command execution is logged to `$DATA_DIR/history.log` for auditing purposes.

    ---

    Powered by BytesAgain | bytesagain.com | hello@bytesagain.com

    // Comments
    Sign in with GitHub to leave a comment.
    // Related skills

    More tools from the same signal band