⚡

// Skill profile

Awesome Open Source AI

Name: Awesome Open Source AI
Author: adisinghstudent

```markdown

by adisinghstudent · published 2026-04-01

开发工具数据处理

Total installs

Stars

★ 0

Last updated

2026-04

// Install command

$ claw add gh:adisinghstudent/adisinghstudent-awesome-opensource-ai

View on GitHub

// Full documentation

---
name: awesome-opensource-ai
description: Curated guide to the best open-source AI projects, models, tools, and infrastructure across the full ML stack
triggers:
  - show me open source AI tools
  - what are the best open source LLMs
  - recommend open source ML frameworks
  - find open source alternatives to closed AI APIs
  - what open source models should I use for my project
  - help me pick an open source inference engine
  - what are good open source RAG tools
  - open source AI stack for production
---

# Awesome Open Source AI

> Skill by [ara.so](https://ara.so) — Daily 2026 Skills collection.

A curated reference for open-source AI models, libraries, infrastructure, and developer tools spanning the full ML/LLM stack — from training frameworks to production deployment.

---

## What This Resource Covers

The [awesome-opensource-ai](https://github.com/alvinunreal/awesome-opensource-ai) list organizes the open-source AI ecosystem into 14 categories:

1. Core Frameworks & Libraries
2. Open Foundation Models
3. Inference Engines & Serving
4. Agentic AI & Multi-Agent Systems
5. Retrieval-Augmented Generation (RAG) & Knowledge
6. Generative Media Tools
7. Training & Fine-tuning Ecosystem
8. MLOps / LLMOps & Production
9. Evaluation, Benchmarks & Datasets
10. AI Safety, Alignment & Interpretability
11. Specialized Domains
12. User Interfaces & Self-hosted Platforms
13. Developer Tools & Integrations
14. Resources & Learning

---

## Quick Decision Guide by Use Case

### "I need to run an LLM locally"

| Need | Recommended Tool |
|------|-----------------|
| Simple local chat | [Ollama](https://github.com/ollama/ollama) |
| Max performance inference | [llama.cpp](https://github.com/ggerganov/llama.cpp) or [vLLM](https://github.com/vllm-project/vllm) |
| OpenAI-compatible API | [LocalAI](https://github.com/mudler/LocalAI) or [LM Studio](https://lmstudio.ai) |
| Production serving | [vLLM](https://github.com/vllm-project/vllm) or [TGI](https://github.com/huggingface/text-generation-inference) |

### "I need to train or fine-tune a model"

| Need | Recommended Tool |
|------|-----------------|
| LoRA/QLoRA fine-tuning | [Unsloth](https://github.com/unslothai/unsloth) or [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory) |
| Full training at scale | [DeepSpeed](https://github.com/microsoft/DeepSpeed) + [Megatron-LM](https://github.com/NVIDIA/Megatron-LM) |
| Quick experiments | [Hugging Face Transformers](https://github.com/huggingface/transformers) + [Accelerate](https://github.com/huggingface/accelerate) |

### "I need to build a RAG pipeline"

| Need | Recommended Tool |
|------|-----------------|
| Full RAG framework | [LlamaIndex](https://github.com/run-llama/llama_index) or [Haystack](https://github.com/deepset-ai/haystack) |
| Vector store | [Chroma](https://github.com/chroma-core/chroma), [Qdrant](https://github.com/qdrant/qdrant), or [Weaviate](https://github.com/weaviate/weaviate) |
| Embeddings model | [sentence-transformers](https://github.com/UKPLab/sentence-transformers) |

### "I need to build an AI agent"

| Need | Recommended Tool |
|------|-----------------|
| General agents | [LangChain](https://github.com/langchain-ai/langchain) or [LlamaIndex Workflows](https://github.com/run-llama/llama_index) |
| Multi-agent orchestration | [AutoGen](https://github.com/microsoft/autogen) or [CrewAI](https://github.com/joaomdmoura/crewAI) |
| Code agents | [OpenHands](https://github.com/All-Hands-AI/OpenHands) or [SWE-agent](https://github.com/princeton-nlp/SWE-agent) |

---

## Model Selection Guide

### Open LLMs by Size & Use Case

Small (1B–7B) — Edge, mobile, low-resource:

- Phi-4-Mini (Microsoft) — best reasoning per parameter

- Gemma 3 2B/7B (Google) — strong efficiency

- Qwen3.5-3B/7B — excellent multilingual

Medium (8B–30B) — Balanced production use:

- Llama 4 8B — general purpose workhorse

- Qwen3.5-14B — coding + math

- Mistral Small — multilingual, tool use

Large (70B+) — Max capability open:

- Llama 4 405B — frontier open model

- DeepSeek-V3.2 (MoE 671B active 37B) — math/reasoning

- Qwen3.5-72B — top open coding/math

Coding Specialists:

- Qwen2.5-Coder-32B — #1 open coding

- DeepSeek-Coder-V2 — MoE coding powerhouse

- StarCoder2-15B — 600+ languages, transparent

Vision-Language:

- Qwen2.5-VL-72B — top open VLM

- InternVL 2.5 — charts, OCR, video

- LLaVA-Next — most popular/documented


---

## Core Framework Examples

### PyTorch — Basic Training Loop

import torch

import torch.nn as nn

from torch.utils.data import DataLoader

# Define model

class SimpleNet(nn.Module):

def __init__(self, input_dim, hidden_dim, output_dim):

super().__init__()

self.layers = nn.Sequential(

nn.Linear(input_dim, hidden_dim),

nn.ReLU(),

nn.Dropout(0.1),

nn.Linear(hidden_dim, output_dim)

)

def forward(self, x):

return self.layers(x)

model = SimpleNet(784, 256, 10).to("cuda")

optimizer = torch.optim.AdamW(model.parameters(), lr=3e-4)

criterion = nn.CrossEntropyLoss()

# Training loop

for epoch in range(10):

for batch_x, batch_y in dataloader:

batch_x, batch_y = batch_x.to("cuda"), batch_y.to("cuda")

optimizer.zero_grad()

logits = model(batch_x)

loss = criterion(logits, batch_y)

loss.backward()

optimizer.step()


### Hugging Face Transformers — Load & Inference

from transformers import AutoTokenizer, AutoModelForCausalLM

import torch

model_id = "meta-llama/Llama-3.1-8B-Instruct"

tokenizer = AutoTokenizer.from_pretrained(model_id)

model = AutoModelForCausalLM.from_pretrained(

model_id,

torch_dtype=torch.bfloat16,

device_map="auto", # auto-distributes across available GPUs

)

messages = [

{"role": "system", "content": "You are a helpful assistant."},

{"role": "user", "content": "Explain gradient descent in simple terms."},

]

input_ids = tokenizer.apply_chat_template(

messages,

add_generation_prompt=True,

return_tensors="pt"

).to(model.device)

with torch.inference_mode():

outputs = model.generate(

input_ids,

max_new_tokens=512,

temperature=0.7,

do_sample=True,

)

response = tokenizer.decode(outputs[0][input_ids.shape[-1]:], skip_special_tokens=True)

print(response)


### Hugging Face Accelerate — Multi-GPU Training

from accelerate import Accelerator

from transformers import AutoModelForSequenceClassification, AutoTokenizer

from torch.utils.data import DataLoader

import torch

accelerator = Accelerator(mixed_precision="bf16")

model = AutoModelForSequenceClassification.from_pretrained("bert-base-uncased", num_labels=2)

optimizer = torch.optim.AdamW(model.parameters(), lr=2e-5)

# Accelerate handles device placement, mixed precision, distributed training

model, optimizer, train_dataloader = accelerator.prepare(

model, optimizer, train_dataloader

)

for epoch in range(3):

for batch in train_dataloader:

outputs = model(**batch)

loss = outputs.loss

accelerator.backward(loss)

optimizer.step()

optimizer.zero_grad()

# Save — handles unwrapping DistributedDataParallel automatically

accelerator.wait_for_everyone()

unwrapped = accelerator.unwrap_model(model)

unwrapped.save_pretrained("./output", save_function=accelerator.save)


---

## Inference Engine Examples

### vLLM — Production OpenAI-Compatible Server

# Install

pip install vllm

# Start server (OpenAI-compatible)

python -m vllm.entrypoints.openai.api_server \

--model meta-llama/Llama-3.1-8B-Instruct \

--dtype bfloat16 \

--tensor-parallel-size 2 \

--max-model-len 8192 \

--port 8000

# Use with OpenAI client

from openai import OpenAI

client = OpenAI(

base_url="http://localhost:8000/v1",

api_key="not-needed" # vLLM doesn't require auth by default

)

response = client.chat.completions.create(

model="meta-llama/Llama-3.1-8B-Instruct",

messages=[{"role": "user", "content": "Write a Python function to reverse a string."}],

temperature=0.7,

max_tokens=512,

)

print(response.choices[0].message.content)


### Ollama — Local Model Management

# Install (macOS/Linux)

curl -fsSL https://ollama.ai/install.sh | sh

# Pull and run models

ollama pull llama3.1:8b

ollama pull qwen2.5-coder:14b

ollama pull mistral:7b

# Interactive chat

ollama run llama3.1:8b

# Serve API (default port 11434)

ollama serve

import ollama

# Simple generation

response = ollama.chat(

model="llama3.1:8b",

messages=[{"role": "user", "content": "What is RAG in AI?"}]

)

print(response["message"]["content"])

# Streaming

for chunk in ollama.chat(

model="qwen2.5-coder:14b",

messages=[{"role": "user", "content": "Write a FastAPI CRUD app"}],

stream=True

print(chunk["message"]["content"], end="", flush=True)

# Embeddings

embedding = ollama.embeddings(

model="nomic-embed-text",

prompt="Represent this document for retrieval:"

)

vector = embedding["embedding"] # list of floats


### llama.cpp — CPU/GPU Inference

# Build

git clone https://github.com/ggerganov/llama.cpp

cd llama.cpp

make -j$(nproc) # CPU only

make LLAMA_CUDA=1 -j$(nproc) # NVIDIA GPU

make LLAMA_METAL=1 -j$(nproc) # Apple Silicon

# Download a GGUF model (e.g. from HuggingFace)

# Then run:

./llama-cli -m ./models/llama-3.1-8b-instruct.Q4_K_M.gguf \

-p "You are a helpful assistant." \

--chat-template llama3 \

-n 512 \

--temp 0.7

# Start OpenAI-compatible server

./llama-server -m ./models/llama-3.1-8b-instruct.Q4_K_M.gguf \

--host 0.0.0.0 --port 8080 \

-ngl 35 # layers to offload to GPU


---

## RAG Pipeline Examples

### LlamaIndex — Complete RAG Setup

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, Settings

from llama_index.llms.ollama import Ollama

from llama_index.embeddings.huggingface import HuggingFaceEmbedding

# Configure models

Settings.llm = Ollama(model="llama3.1:8b", request_timeout=120.0)

Settings.embed_model = HuggingFaceEmbedding(

model_name="BAAI/bge-small-en-v1.5"

)

# Load documents

documents = SimpleDirectoryReader("./data").load_data()

# Build index

index = VectorStoreIndex.from_documents(

documents,

show_progress=True

)

# Persist index

index.storage_context.persist(persist_dir="./storage")

# Query

query_engine = index.as_query_engine(similarity_top_k=5)

response = query_engine.query("What are the main findings?")

print(response)


### Chroma — Vector Store

import chromadb

from chromadb.utils import embedding_functions

# Initialize client (persistent)

client = chromadb.PersistentClient(path="./chroma_db")

# Use sentence-transformers embeddings

ef = embedding_functions.SentenceTransformerEmbeddingFunction(

model_name="all-MiniLM-L6-v2"

)

collection = client.get_or_create_collection(

name="documents",

embedding_function=ef,

metadata={"hnsw:space": "cosine"}

)

# Add documents

collection.add(

documents=[

"PyTorch is a machine learning framework.",

"LangChain helps build LLM applications.",

"Vector databases store embeddings for similarity search.",

ids=["doc1", "doc2", "doc3"],

metadatas=[

{"source": "ml_docs", "category": "framework"},

{"source": "llm_docs", "category": "framework"},

{"source": "db_docs", "category": "database"},

]

)

# Query

results = collection.query(

query_texts=["how do I train neural networks?"],

n_results=2,

where={"category": "framework"} # optional metadata filter

)

for doc, score in zip(results["documents"][0], results["distances"][0]):

print(f"Score: {1 - score:.3f} | {doc[:80]}...")


---

## Agentic AI Examples

### LangChain — ReAct Agent with Tools

from langchain_community.llms import Ollama

from langchain.agents import create_react_agent, AgentExecutor

from langchain.tools import tool

from langchain import hub

llm = Ollama(model="llama3.1:8b")

@tool

def search_docs(query: str) -> str:

"""Search internal documentation for information."""

# Replace with your actual search logic

return f"Documentation results for: {query}"

@tool

def run_python(code: str) -> str:

"""Execute Python code and return the output."""

import io, contextlib

output = io.StringIO()

try:

with contextlib.redirect_stdout(output):

exec(code, {})

return output.getvalue() or "Code executed successfully (no output)"

except Exception as e:

return f"Error: {str(e)}"

tools = [search_docs, run_python]

prompt = hub.pull("hwchase17/react")

agent = create_react_agent(llm, tools, prompt)

executor = AgentExecutor(agent=agent, tools=tools, verbose=True, max_iterations=5)

result = executor.invoke({

"input": "Search for how to use pandas groupby, then write a code example."

})

print(result["output"])


### AutoGen — Multi-Agent Conversation

import autogen

config_list = [{

"model": "llama3.1:8b",

"base_url": "http://localhost:11434/v1",

"api_key": "ollama",

}]

llm_config = {"config_list": config_list, "temperature": 0.7}

# Create agents

assistant = autogen.AssistantAgent(

name="Assistant",

llm_config=llm_config,

system_message="You are a helpful AI. Solve tasks step by step."

)

code_reviewer = autogen.AssistantAgent(

name="CodeReviewer",

llm_config=llm_config,

system_message="You review code for bugs, security issues, and best practices."

)

user_proxy = autogen.UserProxyAgent(

name="User",

human_input_mode="NEVER",

max_consecutive_auto_reply=5,

code_execution_config={"work_dir": "workspace", "use_docker": False},

)

# Group chat

groupchat = autogen.GroupChat(

agents=[user_proxy, assistant, code_reviewer],

messages=[],

max_round=10

)

manager = autogen.GroupChatManager(groupchat=groupchat, llm_config=llm_config)

user_proxy.initiate_chat(

manager,

message="Write a Python script that scrapes headlines from a news RSS feed and summarizes them."

)


---

## Fine-tuning Examples

### Unsloth — Fast LoRA Fine-tuning

from unsloth import FastLanguageModel

from trl import SFTTrainer

from transformers import TrainingArguments

from datasets import load_dataset

# Load model with Unsloth optimizations (2x faster, 60% less VRAM)

model, tokenizer = FastLanguageModel.from_pretrained(

model_name="unsloth/Meta-Llama-3.1-8B-Instruct",

max_seq_length=2048,

dtype=None, # auto-detect

load_in_4bit=True,

)

# Add LoRA adapters

model = FastLanguageModel.get_peft_model(

model,

r=16, # LoRA rank

target_modules=["q_proj", "k_proj", "v_proj", "o_proj",

"gate_proj", "up_proj", "down_proj"],

lora_alpha=16,

lora_dropout=0,

bias="none",

use_gradient_checkpointing="unsloth",

random_state=42,

)

dataset = load_dataset("yahma/alpaca-cleaned", split="train")

trainer = SFTTrainer(

model=model,

tokenizer=tokenizer,

train_dataset=dataset,

dataset_text_field="text",

max_seq_length=2048,

args=TrainingArguments(

per_device_train_batch_size=4,

gradient_accumulation_steps=4,

num_train_epochs=1,

learning_rate=2e-4,

fp16=True,

output_dir="./output",

save_steps=100,

logging_steps=10,

)

trainer.train()

# Save LoRA weights

model.save_pretrained("./lora_model")

tokenizer.save_pretrained("./lora_model")

# Optionally merge and export to GGUF

model.save_pretrained_gguf("./gguf_model", tokenizer, quantization_method="q4_k_m")


---

## MLOps Examples

### MLflow — Experiment Tracking

import mlflow

import mlflow.pytorch

from mlflow.models import infer_signature

mlflow.set_experiment("llm-fine-tuning")

with mlflow.start_run(run_name="llama3-lora-v1"):

# Log hyperparameters

mlflow.log_params({

"model": "llama3.1-8b",

"lora_rank": 16,

"learning_rate": 2e-4,

"epochs": 3,

"batch_size": 4,

})

# Log metrics during training

for step, loss in enumerate(training_losses):

mlflow.log_metric("train_loss", loss, step=step)

mlflow.log_metric("eval_perplexity", 12.4)

mlflow.log_metric("eval_bleu", 0.38)

# Log artifacts

mlflow.log_artifact("./lora_model", artifact_path="model")

mlflow.log_artifact("./training_config.yaml")

# Tag the run

mlflow.set_tags({

"task": "instruction-tuning",

"dataset": "alpaca-cleaned",

"framework": "unsloth",

})

# Query runs programmatically

runs = mlflow.search_runs(

experiment_names=["llm-fine-tuning"],

filter_string="metrics.eval_perplexity < 15",

order_by=["metrics.eval_perplexity ASC"],

)

print(runs[["run_id", "params.model", "metrics.eval_perplexity"]].head())


---

## Common Patterns

### Pattern 1: Local LLM with Fallback

import os

from openai import OpenAI

def get_llm_client(prefer_local: bool = True):

"""Returns OpenAI-compatible client, preferring local vLLM/Ollama."""

if prefer_local:

try:

client = OpenAI(

base_url=os.getenv("LOCAL_LLM_URL", "http://localhost:11434/v1"),

api_key="local"

)

# Test connection

client.models.list()

return client, os.getenv("LOCAL_MODEL", "llama3.1:8b")

except Exception:

pass

# Fallback to OpenAI

return OpenAI(api_key=os.environ["OPENAI_API_KEY"]), "gpt-4o-mini"

client, model = get_llm_client()


### Pattern 2: Embeddings + Similarity Search (No Vector DB)

from sentence_transformers import SentenceTransformer

import numpy as np

model = SentenceTransformer("BAAI/bge-small-en-v1.5")

def build_index(texts: list[str]) -> np.ndarray:

return model.encode(texts, normalize_embeddings=True)

def search(query: str, corpus_embeddings: np.ndarray, texts: list[str], top_k: int = 5):

query_emb = model.encode([query], normalize_embeddings=True)

scores = (query_emb @ corpus_embeddings.T)[0]

top_indices = np.argsort(scores)[::-1][:top_k]

return [(texts[i], float(scores[i])) for i in top_indices]

# Usage

texts = ["doc 1 content", "doc 2 content", "doc 3 content"]

embeddings = build_index(texts)

results = search("my query", embeddings, texts)


### Pattern 3: Structured Output with Pydantic

from pydantic import BaseModel

from transformers import pipeline

import json

class CodeReview(BaseModel):

has_bugs: bool

severity: str # "low" | "medium" | "high" | "critical"

issues: list[str]

suggestions: list[str]

def review_code(code: str, llm_pipeline) -> CodeReview:

prompt = f"""Review this code and respond with ONLY valid JSON matching this schema:

{CodeReview.model_json_schema()}

Code to review:

{code}

output = llm_pipeline(prompt, max_new_tokens=512)[0]["generated_text"]

# Extract JSON from output

json_start = output.rfind("{")

json_end = output.rfind("}") + 1

json_str = output[json_start:json_end]

return CodeReview.model_validate_json(json_str)


---

## Troubleshooting

### CUDA Out of Memory

# Reduce memory usage:

# 1. Use 4-bit quantization

from transformers import BitsAndBytesConfig

bnb_config = BitsAndBytesConfig(

load_in_4bit=True,

bnb_4bit_compute_dtype=torch.bfloat16,

bnb_4bit_quant_type="nf4",

bnb_4bit_use_double_quant=True,

)

model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=bnb_config)

# 2. Enable gradient checkpointing

model.gradient_checkpointing_enable()

# 3. Use smaller batch + gradient accumulation

# Instead of batch_size=32, use batch_size=4, grad_accum=8

# 4. Clear cache between operations

import gc

torch.cuda.empty_cache()

gc.collect()


### vLLM Slow First Response

# Pre-warm the model after startup

curl -s http://localhost:8000/v1/completions \

-H "Content-Type: application/json" \

-d '{"model": "meta-llama/Llama-3.1-8B-Instruct", "prompt": "hi", "max_tokens": 1}'


### Hugging Face Download Issues

# Use environment variables for auth and caching

export HUGGING_FACE_HUB_TOKEN="your_token_here" # use env var, not hardcoded

export HF_HOME="/path/to/large/disk/.cache/huggingface"

export HF_HUB_OFFLINE=1 # use cached files only (after download)

# Download model files explicitly

huggingface-cli download meta-llama/Llama-3.1-8B-Instruct \

--local-dir ./models/llama3.1-8b \

--include "*.safetensors" "*.json" "tokenizer*"


### Ollama Model Not Found

# List available models

ollama list

# Search for models

ollama search llama

# Pull specific version/quantization

ollama pull qwen2.5-coder:14b-instruct-q4_K_M

# Check running status

ollama ps


---

## Environment Setup

# Minimal ML environment

conda create -n ai-dev python=3.11

conda activate ai-dev

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

pip install transformers accelerate datasets peft trl

pip install vllm # production inference

pip install llama-index chromadb # RAG

pip install langchain langchain-community # agents

pip install mlflow # experiment tracking

pip install sentence-transformers # embeddings

pip install unsloth # fast fine-tuning

# Environment variables (add to .env or shell profile)

export HUGGING_FACE_HUB_TOKEN="${HUGGING_FACE_HUB_TOKEN}"

export OPENAI_API_KEY="${OPENAI_API_KEY}" # if using OpenAI fallback

export ANTHROPIC_API_KEY="${ANTHROPIC_API_KEY}" # if using Anthropic

export HF_HOME="${HF_HOME:-~/.cache/huggingface}"

export TRANSFORMERS_CACHE="${HF_HOME}/hub"


---

## Key Resources

- **Awesome List**: https://github.com/alvinunreal/awesome-opensource-ai
- **Hugging Face Hub**: https://huggingface.co/models (model downloads)
- **Ollama Library**: https://ollama.com/library (curated GGUF models)
- **Open LLM Leaderboard**: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard
- **LMSYS Chatbot Arena**: https://chat.lmsys.org (human preference rankings)
- **Papers With Code**: https://paperswithcode.com/sota (benchmark tracking)

// Comments

// Related skills

More tools from the same signal band

Order food/drinks (点餐) on an Android device paired as an OpenClaw node. Uses in-app menu and cart; add goods, view cart, submit order (demo, no real payment).

Sign plugins, rotate agent credentials without losing identity, and publicly attest to plugin behavior with verifiable claims and authenticated transfers.

The philosophical layer for AI agents. Maps behavior to Spinoza's 48 affects, calculates persistence scores, and generates geometric self-reports. Give your...

日历管理数据处理

1 installs★ 0