GitHub 榜 · RL Top 100

OpenPipe/ART

多模态模型开源：Agent Trainer train mult等，结合工具调用与推理。

创建：2025-03-10 Python agentagentic-aigrpo

⭐ 9,892

rllm-org/rllm

文档理解开源：Democratizing LLMs，结合工具调用与推理。

创建：2025-01-26 Python agent-frameworkagentic-workflowcoding-agent

⭐ 5,592

Gen-Verse/OpenClaw-RL

多模态对齐开源：OpenClaw-RL Train any ag等，结合工具调用与推理。

创建：2026-02-26 Python asynccodinggrpo

⭐ 5,444

Kiln-AI/Kiln

多模态对齐开源：Build Evaluate Optimize等，发布数据支撑训练评测。

创建：2024-07-23 Python aichain-of-thoughtcollaboration

⭐ 4,864

PKU-Alignment/align-anything

多模态对齐开源：Align Anything Training等，改进偏好对齐与安全。

创建：2024-07-14 Python chameleondpolarge-language-models

⭐ 4,656

rasbt/reasoning-from-scratch

多模态对齐开源：Implement reasoning LLM等，改进偏好对齐与安全。

创建：2025-03-04 Jupyter Notebook aiartificial-intelligencechain-of-thought

⭐ 4,441

RLinf/RLinf

多模态模型开源：RLinf Infrastructure Emb等，结合工具调用与推理。

创建：2025-08-14 Python agentic-aiembodied-aireinforcement-learning

⭐ 3,642

alibaba/ROLL

多模态对齐开源：Efficient User-Friendly等，结合工具调用与推理。

创建：2025-05-28 Python agenticrlhfrlvr

⭐ 3,205

mll-lab-nu/RAGEN

多模态模型开源：RAGEN leverages train LL等，结合工具调用与推理。

创建：2025-01-25 Python

⭐ 2,688

#10

walkinglabs/hands-on-modern-rl

多模态对齐开源：open-source hands-on cur等，结合工具调用与推理。

创建：2026-04-10 Python agentagenticagentic-ai

⭐ 2,688

#11

qibin0506/Cortex

多模态对齐开源：从零构建大模型从预训练到RLHF的完整实践，改进偏好对齐与安全。

创建：2025-06-24 Python

⭐ 2,663

#12

TsinghuaC3I/Awesome-RL-for-LRMs

多模态模型开源：Survey Large Reasoning，梳理分类、挑战与开放问题。

创建：2025-03-20 TeX awesome-listdeepseek-r1llm

⭐ 2,464

#13

zai-org/GLM-V

视觉语言模型开源：Versatile Multimodal Rea等，扩展长视频时空理解。

创建：2025-06-28 Python image2textreasoningvideo-understanding

⭐ 2,321

#14

PRIME-RL/SimpleVLA-RL

多模态模型开源：ICLR SimpleVLA-RL Scalin等，对齐训练。

创建：2025-05-25 Python reasoningrlvla

⭐ 1,709

#15

AgibotTech/agibot_x1_train

多模态模型开源：training code AgiBot，演示代码实现与复现。

创建：2024-10-23 Python open-sourcereinforcement-learningrobotics

⭐ 1,673

#16

radixark/miles

视觉语言模型开源：Miles is enterprise-faci等，对齐训练。

创建：2025-10-09 Python

⭐ 1,509

#17

thinkwee/AgentsMeetRL

多模态对齐开源：Awesome List Agentic RL，结合工具调用与推理。

创建：2025-06-09 HTML agentagentic-aiagentic-coding

⭐ 1,493

#18

AgentR1/Agent-R1

多模态模型开源：Agent-R1 Training Powerf等，结合工具调用与推理。

创建：2025-03-04 Python agentagentic-rlllm

⭐ 1,454

#19

open-thought/reasoning-gym

多模态模型开源：NeurIPS Spotlight Reason等，对齐训练。

创建：2025-01-23 Python gymlarge-language-modelsreinforcement-learning

⭐ 1,437

#20

Agent-RL/ReCall

多模态模型开源：ReSearch Reason Search L等，结合工具调用与推理。

创建：2025-03-03 Python agentfunction-callingllm

⭐ 1,387

#21

NousResearch/atropos

多模态模型开源：Atropos is Language Envi等，对齐训练。

创建：2025-04-29 Python

⭐ 1,258

#22

whwangovo/pyre-code

多模态对齐开源：self-hosted ML coding pr等，改进偏好对齐与安全。

创建：2026-04-09 Python

⭐ 1,120

#23

xid32/SoundMind

多模态模型开源：We introduce Audio Logic等，发布数据支撑训练评测。

创建：2025-06-13 Python audio-language-modelaudio-reasoningdataset

⭐ 1,109

#24

unitreerobotics/unitree_rl_lab

多模态模型开源：is repository implementa等，演示代码实现与复现。

创建：2025-06-05 Python

⭐ 1,082

#25

PRIME-RL/TTRL

多模态模型开源：NeurIPS TTRL Test-Time，对齐训练。

创建：2025-04-23 Python llmreasoningrl

⭐ 1,078

#26

zai-org/GLM-TTS

多模态模型开源：GLM-TTS Controllable Emo等，面向端侧低成本部署。

创建：2025-12-06 Python edge-computingllmtts

⭐ 1,017

#27

TIGER-AI-Lab/verl-tool

多模态模型开源：version verl support div等，结合工具调用与推理。

创建：2025-03-21 Python agentlearningllm

⭐ 992

#28

lasgroup/SDPO

多模态对齐开源：Self-Distillation SDPO，对齐训练。

创建：2026-01-24 Python distillationllmreasoning

⭐ 928

#29

stepfun-ai/Step-Audio-EditX

多模态模型开源：powerful 3B-parameter LL等，对齐训练。

创建：2025-10-29 Python audio-editingcross-lingualemotion-control

⭐ 926

#30

datawhalechina/diy-llm

多模态对齐开源：系统性大语言模型构建课程覆盖预训练数据工程 T等，改进偏好对齐与安全。

创建：2025-11-24 Jupyter Notebook gpu-programmingllmnlp

⭐ 870

#31

aiming-lab/SkillRL

多模态模型开源：SkillRL Evolving Agents等，结合工具调用与推理。

创建：2026-02-08 Python

⭐ 812

#32

tingaicompass/AI-Compass

多模态对齐开源：AI-Compass 将为社区指引在 AI 技术等，结合工具调用与推理。

创建：2025-05-27 Python agentaillm

⭐ 772

#33

ModalMinds/MM-EUREKA

多模态模型开源：MM-EUREKA Exploring Fron等，对齐训练。

创建：2025-03-07 Python

⭐ 771

#34

WooooDyy/AgentGym-RL

多模态模型开源：Code implementations pap等，结合工具调用与推理。

创建：2025-09-10 Python agentllmllm-based-agent

⭐ 770

#35

GAIR-NLP/DeepResearcher

多模态模型开源：Scaling Research Real-wo等，对齐训练。

创建：2025-04-02 Python

⭐ 760

#36

SkyworkAI/Skywork-OR1

多模态模型开源：Unleashing Power Math Co等，演示代码实现与复现。

创建：2025-04-11 Python

⭐ 744

#37

Denghaoyuan123/Awesome-RL-VLA

视觉语言模型开源：Survey Robotic Manipulat等，梳理分类、挑战与开放问题。

创建：2025-11-13

⭐ 734

#38

zhaorw02/DeepMesh

空间场景理解开源：ICCV Official code DeepM等，演示代码实现与复现。

创建：2025-03-18 Python 3daigcdpo

⭐ 720

#39

RUCAIBox/R1-Searcher

多模态模型开源：R1-searcher Incentivizin等，对齐训练。

创建：2025-02-26 Python

⭐ 715

#40

facebookresearch/swe-rl

多模态模型开源：NeurIPS Official codebas等，演示代码实现与复现。

创建：2025-02-23 Python

⭐ 696

#41

dCaples/AutoDidact

多模态模型开源：Autonomously train resea等，结合工具调用与推理。

创建：2025-03-08 Jupyter Notebook

⭐ 689

#42

sail-sg/oat

多模态对齐开源：OAT research-friendly fr等，改进偏好对齐与安全。

创建：2024-10-15 Python alignmentdistributed-rldistributed-training

⭐ 660

#43

rlresearch/dr-tulu

多模态模型开源：Official repository DR T等，对齐训练。

创建：2025-11-18 Python deepresearchrlrubrics

⭐ 655

#44

agentscope-ai/Trinity-RFT

多模态对齐开源：Trinity-RFT is general-p等，结合工具调用与推理。

创建：2025-04-09 Python agentllmrlhf

⭐ 646

#45

leggedrobotics/robotic_world_model

多模态模型开源：Repository our papers Ro等，拆解论文方法与实验。

创建：2025-11-24 Python

⭐ 645

#46

agentscope-ai/OpenJudge

多模态对齐开源：OpenJudge Unified Framew等，结合工具调用与推理。

创建：2025-07-08 Python agentagent-skillsai-agent

⭐ 638

#47

LeCAR-Lab/BFM-Zero

多模态模型开源：BFM Zero Promptable Beha等，对齐训练。

创建：2025-11-04 Python humanoidreinforcement-learningsim2real

⭐ 596

#48

inclusionAI/ASearcher

多模态模型开源：Open-Source Large-Scale等，结合工具调用与推理。

创建：2025-08-05 Python

⭐ 592

#49

uclaml/SPPO

多模态对齐开源：official implementation等，改进偏好对齐与安全。

创建：2024-06-13 Python deep-learningfine-tuninglarge-language-models

⭐ 586

#50

antirez/ttt-rl

多模态模型开源：example playing tic tac等，对齐训练。

创建：2025-03-10 C

⭐ 583

#51

yongliang-wu/DFT

多模态模型开源：ICLR Generalization SFT等，指令微调。

创建：2025-08-01 Python

⭐ 572

#52

LHRLAB/Graph-R1

多模态模型开源：ICML Official resources等，结合工具调用与推理。

创建：2025-03-11 Python chain-of-thoughtgraphraghypergraph

⭐ 559

#53

X-GenGroup/Flow-Factory

多模态模型开源：unified framework easy F等，扩展长视频时空理解。

创建：2025-12-18 Python diffusionflow-matchingimage-generation

⭐ 557

#54

Gen-Verse/Open-AgentRL

多模态对齐开源：ICML RLAnything DemyAgen等，结合工具调用与推理。

创建：2025-10-13 Python agent-rlcoding-agententropy-method

⭐ 539

#55

Junvate/LLM-Algorithm-Intern-Guide

多模态对齐开源：2026届大模型算法岗实习面经包含 DeepS等，改进偏好对齐与安全。

创建：2026-01-25

⭐ 539

#56

wendell0218/Awesome-RL-for-Video-Generation

多模态对齐开源：curated list papers vide等，扩展长视频时空理解。

创建：2025-02-13 dpogrpoppo

⭐ 534

#57

Gen-Verse/dLLM-RL

多模态对齐开源：ICLR Official code Trace等，改进偏好对齐与安全。

创建：2025-08-26 Python code-generationdiffusion-language-modelslarge-language-models

⭐ 508

#58

erwinmsmith/SOMAS

多模态模型开源：Trusted Human-Multi-Agen等，结合工具调用与推理。

创建：2025-05-13 Python

⭐ 503

#59

mll-lab-nu/VAGEN

视觉语言模型开源：Training VLM agents mult等，结合工具调用与推理。

创建：2025-03-04 Python

⭐ 468

#60

alibaba/ROCK

多模态模型开源：construction kit environ等，对齐训练。

创建：2025-11-07 Python

⭐ 451

#61

dllm-reasoning/d1

多模态模型开源：Official Implementation等，拆解论文方法与实验。

创建：2025-04-10 Python

⭐ 447

#62

weijiawu/Awesome-RL-for-Multimodal-Foundation-Models

多模态模型开源：is repository organizing等，拆解论文方法与实验。

创建：2025-06-06

⭐ 447

#63

deepreinforce-ai/CUDA-L2

多模态模型开源：CUDA-L2 Surpassing cuBLA等，对齐训练。

创建：2025-12-02 Cuda cublascuda-kernelslarge-language-models

⭐ 442

#64

PRIME-RL/Entropy-Mechanism-of-RL

多模态模型开源：Entropy Mechanism Large等，对齐训练。

创建：2025-05-28 Python llmreasoningrl

⭐ 442

#65

chi2liu/ABC-GRPO

多模态对齐开源：Code GRPO，改进偏好对齐与安全，围绕chi2liu/ABC-GRPO展开。

创建：2026-01-07 Python grpollmreinforcement-learning

⭐ 441

#66

nvidia-cosmos/cosmos-rl

多模态模型开源：Cosmos-RL is flexible sc等，对齐训练。

创建：2025-06-10 Python

⭐ 439

#67

ypwang61/One-Shot-RLVR

多模态模型开源：NeurIPS Reasoning Large等，对齐训练。

创建：2025-04-30 Python

⭐ 437

#68

zai-org/VisionReward

多模态对齐开源：AAAI VisionReward Fine-G等，扩展长视频时空理解。

创建：2024-12-12 Python diffusionpreferencerlhf

⭐ 403

#69

LYL1015/JarvisEvo

多模态对齐开源：CVPR JarvisEvo Self-Evol等，结合工具调用与推理。

创建：2025-11-23 Python agentsediting-imagelightroom

⭐ 400

#70

Goekdeniz-Guelmez/mlx-lm-lora

多模态对齐开源：Train Large Language MLX，改进偏好对齐与安全。

创建：2025-05-10 Python appledeep-learningdpo

⭐ 374

#71

openpsi-project/ReaLHF

多模态对齐开源：Super-Efficient RLHF Tra等，面向端侧低成本部署。

创建：2024-06-18 Python deepspeeddistributed-computingdistributed-systems

⭐ 335

#72

benstaf/FinRL_DeepSeek

多模态模型开源：Code paper FinRL-DeepSee等，结合工具调用与推理。

创建：2025-02-06 Jupyter Notebook

⭐ 329

#73

mihirp1998/VADER

文档理解开源：Video Diffusion Alignmen等，扩展长视频时空理解。

创建：2024-06-23 Python alignmentdiffusionreinforcement-learning

⭐ 313

#74

HarderThenHarder/RLLoggingBoard

多模态对齐开源：visuailzation tool make等，改进偏好对齐与安全。

创建：2025-01-06 Python rlhfvisualization

⭐ 295

#75

chrisliu298/awesome-on-policy-distillation

多模态对齐开源：curated collection paper等，面向端侧低成本部署。

创建：2026-03-22 Shell awesomeawesome-listdistillation

⭐ 264

#76

nick7nlp/Awesome-LLM-On-Policy-Distillation

多模态对齐开源：curated collection paper等，梳理分类、挑战与开放问题。

创建：2026-04-05 Python awesome-listawesome-opdawesomeopd

⭐ 256

#77

TYH-labs/unsloth-buddy

多模态对齐开源：Zero-friction LLM fine-t等，结合工具调用与推理。

创建：2026-03-15 Python apple-siliconclaude-codedpo

⭐ 248

#78

ash80/RLHF_in_notebooks

多模态对齐开源：RLHF Supervised fine-tun等，改进偏好对齐与安全。

创建：2025-06-13 Jupyter Notebook

⭐ 247

#79

MiroMindAI/MiroRL

多模态模型开源：MiroRL is MCP-first fram等，结合工具调用与推理。

创建：2025-08-08 Python

⭐ 246

#80

shufangxun/LLaVA-MoD

多模态对齐开源：ICLR LLaVA-MoD Making LL等，分析幻觉成因与缓解。

创建：2024-08-26 Python distillationhallucinationkd

⭐ 227

#81

YangLing0818/IterComp

多模态对齐开源：ICLR IterComp Iterative等，改进偏好对齐与安全。

创建：2024-10-09 Python dporeward-modelingrlhf

⭐ 203

#82

Kwai-YuanQi/MM-RLHF

多模态对齐开源：Next Step Forward Multim等，改进偏好对齐与安全。

创建：2025-02-16 Python

⭐ 200

#83

hyunwoongko/nanoRLHF

多模态对齐开源：nanoRLHF from-scratch jo等，改进偏好对齐与安全。

创建：2025-09-28 Python llmpytorchrlhf

⭐ 189

#84

Jerry-XDL/AIDoctor

多模态对齐开源：AIDoctor training medica等，面向医疗影像与报告。

创建：2025-02-23 Python

⭐ 188

#85

alphadl/AdaRubrics

多模态对齐开源：AdaRubric Adaptive Dynam等，结合工具调用与推理。

创建：2026-02-22 Python agent-evaluationllm-evaluationreward-model

⭐ 175

#86

CIntellifusion/VideoDPO

多模态对齐开源：Official Implementation等，扩展长视频时空理解。

创建：2024-12-19 Python aigcdiffusion-modelsgenerative-ai

⭐ 170

#87

wanshuiyin/ARIS-in-AI-Offer

多模态对齐开源：Bilingual 中文+EN ML LLM d等，梳理分类、挑战与开放问题。

创建：2026-05-19 Python ai-interviewarisautumn-recruiting

⭐ 167

#88

OpenRLHF/OpenRLHF-M

多模态对齐开源：Easy-to-use Scalable Hig等，改进偏好对齐与安全。

创建：2025-03-05 Python

⭐ 163

#89

yihedeng9/rlhf-summary-notes

多模态对齐开源：brief partial summary RL等，改进偏好对齐与安全。

创建：2024-11-15 deep-learninglarge-language-modelspost-training

⭐ 151

#90

bcefghj/learn-MedicalGPT

多模态对齐开源：从零基础到面试通关 100+面试高频考点，面向医疗影像与报告。

创建：2026-03-27 TypeScript

⭐ 145

#91

dengxianghua888-ops/ecoalign-forge

多模态对齐开源：Multi-Agent DPO Data Syn等，结合工具调用与推理。

创建：2026-04-10 Python content-moderationdata-qualitydpo

⭐ 139

#92

rkinas/reasoning_models_how_to

多模态对齐开源：repository serves as col等，改进偏好对齐与安全。

创建：2025-02-17 Python llmrlrlhf

⭐ 137

#93

qwenpilot/FIPO

多模态模型开源：code implements algorith等，演示代码实现与复现。

创建：2026-03-22 Python llm-trainingpost-trainingreasoning-language-models

⭐ 123

#94

NiuTrans/Vision-LLM-Alignment

多模态对齐开源：repository contains code等，改进偏好对齐与安全。

创建：2024-06-29 Python alignmentdpollama3-vision

⭐ 122

#95

ashworks1706/rlhf-from-scratch

多模态对齐开源：theoretical practical di等，改进偏好对齐与安全。

创建：2025-09-14 Jupyter Notebook artificial-intelligencelibrary-development

⭐ 114

#96

laoshan-song/Awesome-LLM-Interview

多模态对齐开源：LLM interview prep notes等，改进偏好对齐与安全。

创建：2026-05-28 HTML

⭐ 114

#97

CROBOT974/WaterLily-RL

多模态模型开源：project contains simulat等，对齐训练。

创建：2025-09-11 Julia

⭐ 108

#98

wlll123456/study_rlhf

study_rlhf 多模态对齐仓库，改进偏好对齐与安全，围绕wlll123456/study展开。

创建：2025-03-27 Jupyter Notebook

⭐ 106

#99

wxhcore/bumblecore

多模态对齐开源：LLM training framework b等，改进偏好对齐与安全。

创建：2025-11-23 Python aideepseekfine-tuning

⭐ 98

#100

ZinYY/Online_RLHF

多模态对齐开源：PyTorch implementation p等，面向端侧低成本部署。

创建：2025-02-07 Python large-language-modelllmpost-training

⭐ 94