HomeBrowseUpload
← Back to registry
// Skill profile

Chemistry Query Agent v1.4.1

name: pharmaclaw-chemistry-query

by cheminem · published 2026-03-22

图像生成数据处理加密货币
Total installs
0
Stars
★ 0
Last updated
2026-03
// Install command
$ claw add gh:cheminem/cheminem-pharmaclaw-chemistry-query
View on GitHub
// Full documentation

---

name: pharmaclaw-chemistry-query

description: Chemistry agent skill for PubChem API queries (compound info/properties, structures/SMILES/images, synthesis routes/references) + RDKit cheminformatics (SMILES to molecule props/logP/TPSA, 2D PNG/SVG viz, Morgan fingerprints, retrosynthesis/BRICS disconnects, multi-step synth planning). Use for chemistry tasks involving compounds, molecules, structures, PubChem data, RDKit analysis, SMILES processing, synthesis routes, retrosynthesis, reaction simulation. Triggers on chemistry, compounds, molecules, chemical data/properties, PubChem, RDKit, SMILES, structures, synthesis, reactions, retrosynthesis, synth plan/route.

type: code

dependencies:

python: ">=3.10"

python-packages:

- rdkit

- gradio

- pandas

- Pillow

system:

- java (JRE 8+ for OPSIN, optional — only needed for IUPAC name→SMILES conversion)

external-apis:

- PubChem REST API (public, no key required)

- ChEMBL API (public, no key required)

- PubMed/NCBI E-utilities (public, no key required)

notes: >

OPSIN JAR (13.8MB) is auto-downloaded on first use of IUPAC name conversion

with pinned SHA-256 checksum verification (d25bc08f...). All other functionality

works without Java/OPSIN.

---

# Chemistry Query Agent v1.4.1

Overview

Full-stack chemistry toolkit combining PubChem data retrieval with RDKit molecule processing, visualization, analysis, retrosynthesis, and synthesis planning. All outputs are structured JSON for easy downstream chaining. Generates PNG/SVG images on demand.

**Key capabilities:**

  • PubChem compound lookup (info, structure, synthesis refs, similarity search)
  • RDKit molecular properties (MW, logP, TPSA, HBD/HBA, rotatable bonds, aromatic rings)
  • 2D molecule visualization (PNG/SVG)
  • BRICS retrosynthesis with recursive depth control
  • Multi-step synthesis route planning
  • Forward reaction simulation with SMARTS templates
  • Morgan fingerprints and similarity/substructure search
  • 21 named reaction templates (Suzuki, Heck, Grignard, Wittig, Diels-Alder, etc.)
  • Quick Start

    # PubChem compound info
    exec python scripts/query_pubchem.py --compound "aspirin" --type info
    
    # Molecular properties from SMILES
    exec python scripts/rdkit_mol.py --smiles "CC(=O)Oc1ccccc1C(=O)O" --action props
    
    # Retrosynthesis
    exec python scripts/rdkit_mol.py --target "CC(=O)Oc1ccccc1C(=O)O" --action retro --depth 2
    
    # Full chain (name → props + draw + retro)
    exec python scripts/chain_entry.py --input-json '{"name": "caffeine", "context": "user"}'

    Scripts

    `scripts/query_pubchem.py`

    PubChem REST API queries with automatic name→CID resolution and timeout handling.

    --compound <name|CID> --type <info|structure|synthesis|similar> [--format smiles|inchi|image|json] [--threshold 80]
  • **info:** Formula, MW, IUPAC name, InChIKey (JSON)
  • **structure:** SMILES, InChI, image URL, or full JSON
  • **synthesis:** Synonyms/references for a compound
  • **similar:** Similar compounds by 2D fingerprint (top 20)
  • `scripts/rdkit_mol.py`

    RDKit cheminformatics engine. Resolves names via PubChem automatically.

    --smiles <SMILES> --action <props|draw|fingerprint|similarity|substruct|xyz|react|retro|plan>

    | Action | Description | Key Args |

    |--------|-------------|----------|

    | props | MW, logP, TPSA, HBD, HBA, rotB, aromRings | `--smiles` |

    | draw | 2D PNG/SVG (300×300) | `--smiles --output file.png --format png\|svg` |

    | retro | BRICS recursive retrosynthesis | `--target <SMILES\|name> --depth N` |

    | plan | Multi-step retro route | `--target <SMILES\|name> --steps N` |

    | react | Forward reaction via SMARTS | `--reactants "smi1 smi2" --smarts "<SMARTS>"` |

    | fingerprint | Morgan fingerprint bitvector | `--smiles --radius 2` |

    | similarity | Tanimoto similarity scoring | `--query_smiles --target_smiles "smi1,smi2"` |

    | substruct | Substructure matching | `--query_smiles --target_smiles "smi1,smi2"` |

    | xyz | 3D coordinates (MMFF optimized) | `--smiles` |

    `scripts/chain_entry.py`

    Standard agent chain interface. Accepts `{"smiles": "...", "context": "..."}` or `{"name": "...", "context": "..."}`. Returns unified JSON with props, visualization, and retrosynthesis.

    python scripts/chain_entry.py --input-json '{"name": "sotorasib", "context": "user"}'

    Output schema:

    {
      "agent": "chemistry-query",
      "version": "1.4.0",
      "smiles": "<canonical>",
      "status": "success|error",
      "report": {"props": {...}, "draw": {...}, "retro": {...}},
      "risks": [],
      "viz": ["path/to/image.png"],
      "recommend_next": ["pharmacology", "toxicology"],
      "confidence": 0.95,
      "warnings": [],
      "timestamp": "ISO8601"
    }

    `scripts/templates.json`

    21 named reaction templates with SMARTS, expected yields, conditions, and references. Includes: Suzuki, Heck, Buchwald-Hartwig, Grignard, Wittig, Diels-Alder, Click, Sonogashira, Negishi, and more.

    Chaining

    1. **Name → Full Profile:** `chain_entry.py` with `{"name": "ibuprofen"}` → props + draw + retro

    2. **Chemistry → Pharmacology:** Output feeds directly into `pharma-pharmacology-agent`

    3. **Retro + Viz:** Get precursors, then draw each one

    4. **Suzuki Test:** `--action react --reactants "c1ccccc1Br c1ccccc1B(O)O" --smarts "[c:1][Br:2].[c:3][B]([c:4])(O)O>>[c:1][c:3]"`

    Tested With

    All features verified end-to-end with RDKit 2024.03+:

    | Molecule | SMILES | Tests Passed |

    |----------|--------|-------------|

    | Caffeine | `CN1C=NC2=C1C(=O)N(C(=O)N2C)C` | info, structure, props, draw, retro, plan, chain |

    | Aspirin | `CC(=O)Oc1ccccc1C(=O)O` | info, structure, props, draw, retro, plan, chain |

    | Sotorasib | PubChem name lookup | info, structure, props, draw, retro, chain |

    | Ibuprofen | PubChem name lookup | info, structure, props, chain |

    | Invalid SMILES | `XXXINVALID` | Graceful JSON error |

    | Empty input | `{}` | Graceful JSON error |

    Resources

  • `references/api_endpoints.md` — PubChem API endpoint reference and rate limits
  • `scripts/rdkit_reaction.py` — Legacy reaction module
  • `scripts/chembl_query.py`, `scripts/pubmed_search.py`, `scripts/admet_predict.py` — Additional query modules
  • `scripts/advanced_chem.py`

    Advanced cheminformatics engine with 6 Tier 1 capabilities.

    --action <standardize|descriptors|scaffold|mcs|mmpa|chemspace> --smiles <SMILES> [options]

    | Action | Description | Key Args |

    |--------|-------------|----------|

    | standardize | Salt stripping, charge normalization, tautomer enumeration | `--smiles` |

    | descriptors | 217+ molecular descriptors (RDKit full set), QED, SA Score, Lipinski/Veber rules | `--smiles --descriptor_set all\|druglike\|physical\|topological` |

    | scaffold | Murcko scaffold extraction, generic scaffolds, diversity analysis, R-group decomposition | `--smiles` or `--target_smiles "smi1,smi2,..."` `--rgroup_core <SMARTS>` |

    | mcs | Maximum Common Substructure across 2+ molecules | `--target_smiles "smi1,smi2,..."` |

    | mmpa | Matched Molecular Pair Analysis — find single-point transformations | `--target_smiles "smi1,smi2,..."` |

    | chemspace | Chemical space visualization (PCA/t-SNE/UMAP scatter plot PNG) | `--target_smiles "smi1,smi2,..." --method pca\|tsne\|umap --output plot.png` |

    **Examples:**

    # Standardize a salt form
    python scripts/advanced_chem.py --action standardize --smiles "[Na+].CC(=O)[O-]"
    
    # Full descriptors (217+)
    python scripts/advanced_chem.py --action descriptors --smiles "CC(=O)Oc1ccccc1C(=O)O" --descriptor_set all
    
    # Scaffold diversity of a set
    python scripts/advanced_chem.py --action scaffold --target_smiles "CC(=O)Oc1ccccc1C(=O)O,CN1C=NC2=C1C(=O)N(C(=O)N2C)C,CC(C)Cc1ccc(cc1)C(C)C(=O)O"
    
    # MCS of aspirin and salicylic acid
    python scripts/advanced_chem.py --action mcs --target_smiles "CC(=O)Oc1ccccc1C(=O)O,c1ccccc1C(=O)O"
    
    # Matched molecular pairs
    python scripts/advanced_chem.py --action mmpa --target_smiles "c1ccc(CC(=O)O)cc1,c1ccc(CCC(=O)O)cc1"
    
    # Chemical space PCA plot
    python scripts/advanced_chem.py --action chemspace --target_smiles "CC(=O)Oc1ccccc1C(=O)O,CN1C=NC2=C1C(=O)N(C(=O)N2C)C,c1ccccc1" --method pca --output space.png

    Changelog

    **v2.0.0** (2026-02-28)

  • NEW: `advanced_chem.py` with 6 Tier 1 cheminformatics capabilities
  • - Molecular Standardization & Tautomer Enumeration (salt stripping, charge normalization, canonical tautomers)

    - Extended Descriptors (217+ RDKit descriptors, QED, SA Score, Lipinski, Veber)

    - Scaffold Analysis (Murcko, generic scaffolds, diversity ratio, R-group decomposition)

    - Maximum Common Substructure (rdFMCS with coverage per molecule)

    - Matched Molecular Pair Analysis (rdMMPA fragmentation, transformation detection)

    - Chemical Space Visualization (PCA/t-SNE/UMAP with matplotlib scatter plots)

  • Dependencies: scikit-learn, matplotlib (added)
  • **v1.4.1** (2026-02-25)

  • Security hardening: input sanitization for all subprocess calls (SMILES, compound names, output paths)
  • Added `_sanitize_input()` — length limits, null-byte rejection for all user inputs
  • Added `_sanitize_output_path()` — prevents path traversal, restricts extensions, blocks arbitrary file writes
  • Added shell metacharacter rejection in `resolve_target()`
  • Added SMILES validation via RDKit in `chem_ui.py` before subprocess calls
  • Added compound input validation in `query_pubchem.py` (length/null-byte checks)
  • Added timeout to `resolve_target()` PubChem subprocess call
  • Addresses VirusTotal "suspicious" classification for argument injection vectors
  • **v1.4.0** (2026-02-14)

  • Fixed PubChem SMILES/InChI endpoint (property/CanonicalSMILES/TXT)
  • Fixed chain_entry.py HTML entity corruption
  • Fixed brics_retro to handle BRICSDecompose string output correctly
  • Added request timeouts (15s) to all PubChem calls
  • Graceful error handling for invalid SMILES and empty input
  • Updated chain output version and schema
  • Comprehensive end-to-end testing
  • **v1.3.0**

  • RDKit props NoneType fixes, invalid SMILES graceful errors
  • React fix: ReactionFromSmarts import
  • Name resolution via PubChem for all RDKit actions
  • **v1.2.0**

  • BRICS retrosynthesis + 21 reaction templates library
  • Multi-step synthesis planning
  • // Comments
    Sign in with GitHub to leave a comment.
    // Related skills

    More tools from the same signal band