#reinforcement-learning
총 2건 · 1/1 페이지
-
DeepSeek V3에서 V3.2로: 아키텍처, 희소 주의, 강화학습 업데이트
From DeepSeek V3 to V3.2: Architecture, Sparse Attention, and RL Updates
Understanding How DeepSeek's Flagship Open-Weight Models Evolved
-
LLM 추론을 위한 강화학습의 현황
The State of Reinforcement Learning for LLM Reasoning
Understanding GRPO and New Insights from Reasoning Model Papers