RL in Production | 8 Week Bootcamp to Master Reinforcement Learning in Production
Vizuara:Deep RL专题,改进偏好对齐与安全。
9 AI Concepts Explained in 7 minutes: AI Agents, RAGs, Tokenization, RLHF, Diffusion, LoRA...
ByteByteAI:RLHF专题,结合工具调用与推理。
Reinforcement Learning Trading Bot in Python | Train an AI Agent on Forex (EURUSD)
CodeTradin:Deep RL专题,结合工具调用与推理。
OpenAI’s Deep Research Team on Why Reinforcement Learning is the Future for AI Agents
Sequoia Ca:Deep RL专题,结合工具调用与推理。
LLMs from Scratch – Practical Engineering from Base Model to PPO RLHF
freeCodeCa:PPO专题,改进偏好对齐与安全。
Stanford CS234 Reinforcement Learning I Introduction to Reinforcement Learning I 2024 I Lecture 1
Stanford O:Deep RL专题,讲解原理与上手路径。
Reinforcement learning sim-to-real policy trained in mujoco rotary inverted pendulum
Kevin Wood:围绕Deep RL的多模态模型分享。
AI Fish Evolve through Reinforcement Learning with Genetic Algorithms #programming #ai
CodeCrafte:围绕Deep RL的多模态模型分享。
From Sim to Real: Can Reinforcement Learning Control a Real Robot - Powered by RDK X5 (Edge AI)
AI Researc:Deep RL专题,面向端侧低成本部署。
[Full Workshop] Reinforcement Learning, Kernels, Reasoning, Quantization & Agents — Daniel Han
AI Enginee:Deep RL专题,结合工具调用与推理。
Reinforcement Learning for Agents - Will Brown, ML Researcher at Morgan Stanley
AI Enginee:Deep RL专题,结合工具调用与推理。
Stanford CS224R Deep Reinforcement Learning | Spring 2025 | Lecture 1: Class Intro
Stanford O:围绕Deep RL的多模态模型分享。
LLM Fine-Tuning Course – From Supervised FT to RLHF, LoRA, and Multimodal
freeCodeCa:RLHF专题,改进偏好对齐与安全。
Paper: DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
Umar Jamil:Deep RL专题,拆解论文方法与实验。
Reinforcement Learning with Human Feedback (RLHF), Clearly Explained!!!
StatQuest :RLHF专题,改进偏好对齐与安全。
Stanford CS234 Reinforcement Learning I Tabular MDP Planning I 2024 I Lecture 2
Stanford O:Deep RL专题,介绍平台接口与集成。
DeepSeek's GRPO (Group Relative Policy Optimization) | Reinforcement Learning for LLMs
Julia Turc:Deep RL专题,介绍平台接口与集成。
SESSION 1 | Multi-Agent Reinforcement Learning: Foundations and Modern Approaches | IIIA-CSIC Course
IIIA-CSIC:多智能体专题,结合工具调用与推理。
Stanford CS230 | Autumn 2025 | Lecture 5: Deep Reinforcement Learning
Stanford O:围绕Deep RL的多模态模型分享。
Implement Deep Q-Learning with PyTorch and Train Flappy Bird! | DQN PyTorch Beginners Tutorial #1
Johnny Cod:DQN专题,讲解原理与上手路径。
Direct Preference Optimization (DPO) - How to fine-tune LLMs directly without reinforcement learning
Luis Serra:Deep RL专题,改进偏好对齐与安全。
How to finetune LLMs to THINK with Reinforcement Learning (GRPO from scratch!)
Neural Bre:围绕Deep RL的多模态模型分享。
Simply Explaining Proximal Policy Optimization (PPO) | Deep Reinforcement Learning
Johnny Cod:围绕PPO的多模态模型分享。
Stanford CS224R Deep Reinforcement Learning | Spring 2025 | Lecture 2: Imitation Learning
Stanford O:围绕Deep RL的多模态模型分享。
Reinforcement Learning explained in simple and easy way! Important Machine Learning topics to know!
Keerti Pur:Deep RL专题,讲解原理与上手路径。
Trying to teach my new model to run using reinforcement learning
Goatstream:Deep RL专题,介绍平台接口与集成。
Stanford CS234 Reinforcement Learning I Q learning and Function Approximation I 2024 I Lecture 4
Stanford O:Deep RL专题,介绍平台接口与集成。
Deep Reinforcement Learning–Based Bipedal Robot Walking in MATLAB #RL #matlab #simulink #bipedal
TODAYS TEC:围绕Deep RL的多模态模型分享。
Bot de IA que APRENDE a Tradear | Q-Learning vs Trading Manual (Gratis)
Ignacio Ay:围绕DQN的多模态模型分享。
Reinforcement Learning Algorithms | Machine Learning Tutorial | TutorialsPoint
TutorialsP:Deep RL专题,讲解原理与上手路径。
1. بالعربي المحاضرة الأولى في ال Reinforcement Learning بالعربي | مفاهيم أساسية
ELPRINCE:Deep RL专题,介绍平台接口与集成。
The ONLY DeepSeek GRPO/PPO video you'll EVER need (with examples and exercises) | RL Foundations
Depth Firs:PPO专题,扩展长视频时空理解。
LLM Training & Reinforcement Learning from Google Engineer | SFT + RLHF | PPO vs GRPO vs DPO
Martin Is :PPO专题,改进偏好对齐与安全。
Experimenting with Reinforcement Learning with Verifiable Rewards (RLVR)
Nathan Lam:围绕Deep RL的多模态模型分享。
Reinforcement Learning #1: Multi-Armed Bandits, Explore vs Exploit, Epsilon-Greedy, UCB
Zachary Hu:围绕Deep RL的多模态模型分享。
Intrusion Detection using Machine Learning | Multi-Agent Reinforcement Learning | Final Year Project
Ieee Xpert:多智能体专题,结合工具调用与推理。
Agentic AI MOOC | UC Berkeley CS294-196 Fall 2025 | Multi-Agent AI by Noam Brown
Berkeley R:多智能体专题,结合工具调用与推理。
PPO - Proximal Policy Optimization paper explained in a min. #ppo #trpo #llm #trendingshorts #ainews
Paper in a:PPO专题,讲解原理与上手路径。
Lecture 4 - Reinforcement Learning - Basics | Reasoning LLMs from Scratch
Vizuara:围绕Deep RL的多模态模型分享。
Sergey Levine - Reinforcement Learning in the Age of Foundation Models - RLC 2024
Reinforcem:围绕Deep RL的多模态模型分享。
Agentic AI MOOC | UC Berkeley CS294-196 F25 | Multi-Agent Systems in Era of LLMs by Oriol Vinyals
Berkeley R:多智能体专题,结合工具调用与推理。
Stanford CS234 Reinforcement Learning I Multi-Agent Game Playing I 2024 I Lecture 14
Stanford O:多智能体专题,结合工具调用与推理。
LangGraph:17 Introduction to Multi-Agent System #llm #genai #aiagents #ai #genai #agent
Sunny Savi:多智能体专题,结合工具调用与推理。
Building AI Agents at Scale: Open Claw, ClawMax & Multi-Agent Systems Explained
The AI All:多智能体专题,结合工具调用与推理。
SESSION 2 | Multi-Agent Reinforcement Learning: Foundations and Modern Approaches | IIIA-CSIC Course
IIIA-CSIC:多智能体专题,结合工具调用与推理。
Proximal Policy Optimization (PPO) & Group Relative Policy Optimization (GRPO) | Paper Explained
Outlier:PPO专题,讲解原理与上手路径。
【人工智能】算力从来不是唯一瓶颈 | PPO之父 | 约翰·舒尔曼 | RLHF架构师 | Thinking Machines | Tinker微调API | 价值函数回归 | 算法巧思 | 强化学习
最佳拍档:PPO专题,改进偏好对齐与安全,围绕人工智能 算力从来不是唯一瓶颈 展开。
Superhuman Safe and Agile Racing through Multi-Agent Reinforcement Learning
UZH Roboti:多智能体专题,结合工具调用与推理。
Coding Ninjas 6 Month Advanced Certification in GenAI & Multi-Agent Systems
Coding Nin:多智能体专题,结合工具调用与推理。
AI Agent Lands Lunar on the Moon! | Deep Q-Learning | PyTorch | Reinforcement Learning | Gymnasium
Tutorial H:DQN专题,结合工具调用与推理。