⚡

// Skill profile

ARIS — Autonomous Research In Sleep

Name: ARIS — Autonomous Research In Sleep
Author: adisinghstudent

```markdown

by adisinghstudent · published 2026-04-01

开发工具数据处理

Total installs

Stars

★ 0

Last updated

2026-04

// Install command

$ claw add gh:adisinghstudent/adisinghstudent-aris-autonomous-research

View on GitHub

// Full documentation

---
name: aris-autonomous-research
description: ARIS (Auto-Research-In-Sleep) — Markdown-only autonomous ML research workflows using cross-model review loops, idea discovery, experiment automation, and paper writing with Claude Code or any LLM agent.
triggers:
  - run autonomous research pipeline
  - set up ARIS research workflow
  - use claude code for ML research
  - automate paper writing with AI
  - cross-model research review loop
  - run experiment automation with ARIS
  - install ARIS skills for claude code
  - generate research ideas while sleeping
---

# ARIS — Autonomous Research In Sleep

> Skill by [ara.so](https://ara.so) — Daily 2026 Skills collection.

ARIS is a **zero-dependency, Markdown-only** autonomous ML research system. Each "skill" is a plain `SKILL.md` file that any LLM agent can read and execute. The system orchestrates **cross-model collaboration**: one model executes (Claude Code / Codex) while another critiques (GPT-5.4 / Gemini / GLM / MiniMax), breaking self-review blind spots without any framework or lock-in.

Core capabilities:
- 🔬 **Idea discovery** from a research direction or existing paper
- 🧪 **Experiment automation** with GPU-ready code generation and W&B tracking
- 📝 **Paper writing** (LaTeX, Beamer slides, A0 poster)
- 🔁 **Cross-model review loops** with score progression
- 📬 **Rebuttal drafting** with safety gates (no fabrication, no overpromise, full coverage)

---

## Installation

### 1. Clone the repository

git clone https://github.com/wanshuiyin/Auto-claude-code-research-in-sleep.git

cd Auto-claude-code-research-in-sleep


### 2. Install skills into Claude Code

Copy the skills directory to your project, or symlink it:

# Option A: copy skills to your project

cp -r skills/ /your/project/.claude/skills/

# Option B: symlink (keeps skills up to date)

ln -s /path/to/Auto-claude-code-research-in-sleep/skills /your/project/.claude/skills


Claude Code auto-discovers `SKILL.md` files in `.claude/skills/**`. No registration step needed.

### 3. Configure the MCP reviewer (cross-model review)

ARIS uses the `llm-chat` MCP server so the executor model can call a second model for review. Install it:

cd mcp-servers/llm-chat

pip install -r requirements.txt # or: uv pip install -r requirements.txt


Add to your `claude_desktop_config.json` (or Claude Code MCP config):

{

"mcpServers": {

"llm-chat": {

"command": "python",

"args": ["/path/to/Auto-claude-code-research-in-sleep/mcp-servers/llm-chat/server.py"],

"env": {

"OPENAI_API_KEY": "$OPENAI_API_KEY",

"LLM_MODEL": "gpt-4o"

}


> For alternative reviewers (Kimi, GLM, MiniMax, DeepSeek) set `LLM_BASE_URL` and `LLM_MODEL` to the provider's OpenAI-compatible endpoint. No Claude or OpenAI API required.

### 4. (Optional) Codex MCP for OpenAI Codex as executor

{

"mcpServers": {

"codex": {

"command": "npx",

"args": ["@openai/codex-mcp"],

"env": {

"OPENAI_API_KEY": "$OPENAI_API_KEY"

}


---

## Environment Variables

| Variable | Required | Description |
|---|---|---|
| `OPENAI_API_KEY` | For GPT reviewer | OpenAI API key |
| `ANTHROPIC_API_KEY` | For Claude executor | Anthropic API key |
| `LLM_BASE_URL` | Alternative reviewer | OpenAI-compatible base URL |
| `LLM_MODEL` | Alternative reviewer | Model name at that endpoint |
| `WANDB_API_KEY` | Experiment tracking | Weights & Biases key |

---

## Workflows & Commands

### Full pipeline (idea → paper)

/research-pipeline "factorized gap in discrete diffusion LMs"


With a reference paper and base codebase:

/research-pipeline "improve positional encoding in transformers" — ref paper: https://arxiv.org/abs/2104.09864, base repo: https://github.com/facebookresearch/fairseq


Parameters:

| Flag | Default | Effect |
|---|---|---|
| `ref paper` | — | ARIS reads this paper, finds weaknesses, targets them |
| `base repo` | — | Clone and use this repo as experiment base |
| `compact: true` | false | Generate lean summary files (good for short-context models) |

---

### Workflow 1 — Idea Discovery

/idea-discovery "sparse attention in long-context LLMs"


What it does:
1. Searches recent arXiv papers in the direction
2. Identifies open gaps and contradiction clusters
3. Generates 3–5 novel ideas with novelty scores
4. Runs `/research-refine` to sharpen the best idea into a problem statement

---

### Workflow 1.5 — Experiment Bridge

/experiment-bridge "idea_proposal.md" — base repo: https://github.com/huggingface/diffusers


What it does:
1. Reads the sharpened idea from Workflow 1
2. Generates GPU-ready experiment code
3. Runs **GPT cross-model code review** before deployment (`code review: true` by default)
4. Executes training loop with W&B logging
5. Saves results to `experiment_results/`

Example generated experiment scaffold:

# experiment_results/run_001/train.py (auto-generated by /experiment-bridge)

import wandb

import torch

from torch.utils.data import DataLoader

wandb.init(

project=os.environ.get("WANDB_PROJECT", "aris-experiment"),

config={

"method": "factorized_discrete_diffusion",

"lr": 3e-4,

"epochs": 50,

"batch_size": 32,

}

)

for epoch in range(config.epochs):

for batch in dataloader:

loss = model(batch)

wandb.log({"loss": loss.item(), "epoch": epoch})


---

### Workflow 2 — Literature Review

/literature-review "discrete diffusion language models"


Anti-hallucination: ARIS verifies every citation via DBLP → CrossRef → marks unverified as `[VERIFY]`. Never fabricates BibTeX.

---

### Workflow 3 — Paper Writing

/paper-write "experiment_results/ + idea_proposal.md" — venue: NeurIPS


Supported venue templates: `CVPR`, `NeurIPS`, `ICML`, `ICLR`, `ACL`, `AAAI`, `ACM MM`

---

### Workflow 4 — Rebuttal

/rebuttal "paper/ + reviews/" — venue: ICML, character limit: 5000


Parameters:

| Parameter | Default | Description |
|---|---|---|
| `venue` | `ICML` | Target conference |
| `character limit` | **Required** | Hard character limit |
| `quick mode` | false | Stop after strategy (Phase 0–3), no draft |
| `auto experiment` | false | Auto-run experiments when reviewers ask for new evidence |
| `max stress test rounds` | 1 | GPT-5.4 stress-test passes on draft |
| `max followup rounds` | 3 | Per-reviewer follow-up round limit |

Three safety gates — rebuttal will NOT finalize if any fails:
- 🔒 No fabrication — every claim maps to paper/review/confirmed result
- 🔒 No overpromise — every promise is user-approved
- 🔒 Full coverage — every reviewer concern is tracked

Outputs:
- `PASTE_READY.txt` — exact character count, ready to paste to venue portal
- `REBUTTAL_DRAFT_rich.md` — extended version for manual editing

---

### Presentation & Poster

/paper-slides "paper/" # Beamer PDF + PPTX + speaker notes + Q&A prep

/paper-poster "paper/" # A0/A1 poster PDF + PPTX + SVG (venue colors)


---

## Standalone Utility Skills

| Skill | Command | What it does |
|---|---|---|
| `training-check` | `/training-check "train.py"` | Diagnose training instability, NaN, slow convergence |
| `result-to-claim` | `/result-to-claim "results.json"` | Convert raw numbers into paper-ready claims |
| `ablation-planner` | `/ablation-planner "idea.md"` | Design minimal ablation study for a method |
| `experiment-plan` | `/experiment-plan "idea.md"` | Claim-driven experiment roadmap |
| `research-refine` | `/research-refine "idea.md"` | Sharpen vague idea into problem-anchored proposal |
| `formula-derivation` | `/formula-derivation "method.md"` | Develop and verify research formulas |
| `paper-illustration` | `/paper-illustration "paper/"` | Generate figures (Gemini-assisted) |
| `grant-proposal` | `/grant-proposal "idea.md"` | Draft grant proposal from research idea |

---

## Alternative Model Combinations

ARIS requires only an OpenAI-compatible endpoint for the reviewer. Set environment variables:

# Kimi as reviewer

export LLM_BASE_URL="https://api.moonshot.cn/v1"

export LLM_MODEL="moonshot-v1-128k"

export LLM_API_KEY=$MOONSHOT_API_KEY

# DeepSeek as reviewer

export LLM_BASE_URL="https://api.deepseek.com/v1"

export LLM_MODEL="deepseek-chat"

export LLM_API_KEY=$DEEPSEEK_API_KEY

# MiniMax as reviewer

export LLM_BASE_URL="https://api.minimax.chat/v1"

export LLM_MODEL="abab6.5s-chat"

export LLM_API_KEY=$MINIMAX_API_KEY


Then in `mcp-servers/llm-chat/server.py` the `LLM_BASE_URL` env var overrides the OpenAI default. No code changes needed.

---

## Using with Codex CLI (no Claude)

ARIS ships a parallel `skills-codex/` directory with the same workflows adapted for OpenAI Codex CLI:

# Install Codex CLI

npm install -g @openai/codex

# Run a workflow

codex --skill skills/skills-codex/research-pipeline/SKILL.md \

"improve contrastive learning in vision transformers"


---

## Using with Cursor

1. Open Cursor settings → Rules → paste content of `docs/CURSOR_ADAPTATION.md`
2. Copy `skills/` to `.cursorrules-skills/` in your project
3. In chat: `@research-pipeline "your research direction"`

---

## Using with Trae (ByteDance IDE)

See [`docs/TRAE_ARIS_RUNBOOK_EN.md`](docs/TRAE_ARIS_RUNBOOK_EN.md) for full setup. Trae supports SKILL.md natively via its plugin system.

---

## Input Templates

Pre-filled templates for every workflow live in `templates/`:

templates/

research-pipeline.md # Full pipeline input

idea-discovery.md

experiment-bridge.md

literature-review.md

paper-write.md

rebuttal.md

paper-slides.md

paper-poster.md


Use a template:

/research-pipeline — template: templates/research-pipeline.md


---

## Project Structure

Auto-claude-code-research-in-sleep/

├── skills/

│ ├── research-pipeline/SKILL.md # Main orchestration workflow

│ ├── idea-discovery/SKILL.md # Workflow 1

│ ├── experiment-bridge/SKILL.md # Workflow 1.5

│ ├── literature-review/SKILL.md # Workflow 2

│ ├── paper-write/SKILL.md # Workflow 3

│ ├── rebuttal/SKILL.md # Workflow 4

│ ├── paper-slides/SKILL.md

│ ├── paper-poster/SKILL.md

│ ├── training-check/SKILL.md

│ ├── result-to-claim/SKILL.md

│ ├── ablation-planner/SKILL.md

│ ├── experiment-plan/SKILL.md

│ ├── research-refine/SKILL.md

│ ├── formula-derivation/SKILL.md

│ └── skills-codex/ # Codex CLI variants

├── mcp-servers/

│ └── llm-chat/ # OpenAI-compatible reviewer MCP

├── templates/ # Input templates per workflow

├── docs/

│ ├── CURSOR_ADAPTATION.md

│ ├── TRAE_ARIS_RUNBOOK_EN.md

│ ├── ANTIGRAVITY_ADAPTATION.md

│ ├── MODELSCOPE_GUIDE.md # Free tier setup

│ ├── MiniMax-GLM-Configuration.md

│ └── CODEX_GEMINI_REVIEW_GUIDE.md

└── README.md


---

## Common Patterns

### Pattern 1: Start from an arXiv paper you want to beat

/research-pipeline "improve method" — ref paper: https://arxiv.org/abs/2406.04329, base repo: https://github.com/org/repo


ARIS reads the paper → identifies weaknesses → clones repo → generates targeted ideas → runs experiments → writes paper.

### Pattern 2: Resume interrupted session

Add `compact: true` to any workflow. ARIS writes a lean `SESSION_SUMMARY.md`. On resume:

/research-pipeline — resume: SESSION_SUMMARY.md


### Pattern 3: Jump into the middle of a pipeline

Already have results? Jump to paper writing:

/paper-write "my_results/ + my_idea.md" — venue: NeurIPS


Already have a paper? Jump to rebuttal:

/rebuttal "paper/ + reviews/" — venue: ICML, character limit: 5000


### Pattern 4: Free tier via ModelScope

export LLM_BASE_URL="https://api-inference.modelscope.cn/v1"

export LLM_MODEL="Qwen/Qwen2.5-72B-Instruct"

export LLM_API_KEY=$MODELSCOPE_API_KEY


See `docs/MODELSCOPE_GUIDE.md` for zero-cost setup.

---

## Troubleshooting

**Skills not discovered by Claude Code**

Ensure SKILL.md files are under `.claude/skills/` relative to your project root. Claude Code scans this path at startup.

**MCP reviewer not connecting**

# Test the llm-chat server directly

cd mcp-servers/llm-chat

python server.py --test

# Should print: {"status": "ok", "model": "gpt-4o"}


**W&B logging not working in experiment-bridge**

wandb login # uses WANDB_API_KEY env var, or prompts for manual entry


**Citation hallucination in literature-review**

All unverified citations are tagged `[VERIFY]` in output. Search DBLP manually for flagged entries before including in your paper. Never remove the `[VERIFY]` tag without confirming.

**Rebuttal exceeds character limit**

ARIS tracks character count per section. If a draft exceeds the limit, it automatically trims supporting evidence (keeps claims, removes elaboration). You can also pass `quick mode: true` to get the strategy without the draft, then write targeted sections manually.

**Cross-model review loop not running (self-review fallback)**

If the `llm-chat` MCP is unreachable, ARIS falls back to single-model review with a warning in the output. Check MCP server logs:

tail -f ~/.claude/mcp-logs/llm-chat.log


**Session context overflow**

Use `compact: true` on any workflow invocation to produce a compressed `SESSION_SUMMARY.md` that fits in a fresh context window.

---

## Extending ARIS

Every skill is a plain Markdown file. To create a custom skill:

# my-custom-skill

Trigger

When the user says "run my custom analysis"...

Steps

1. Read input files

2. Call `mcp__llm-chat__chat` with the review prompt

3. Write output to `custom_output/`

Output

`custom_output/analysis.md`

`custom_output/score.json`


Save as `.claude/skills/my-custom-skill/SKILL.md` and Claude Code will discover it automatically.

// Comments

// Related skills

More tools from the same signal band

Order food/drinks (点餐) on an Android device paired as an OpenClaw node. Uses in-app menu and cart; add goods, view cart, submit order (demo, no real payment).

Sign plugins, rotate agent credentials without losing identity, and publicly attest to plugin behavior with verifiable claims and authenticated transfers.

The philosophical layer for AI agents. Maps behavior to Spinoza's 48 affects, calculates persistence scores, and generates geometric self-reports. Give your...

日历管理数据处理

1 installs★ 0