#reinforcement-learning
총 6건 · 1/1 페이지
-
에이전트에게 컴퓨터를 제공하다 — Ivan Burazin, Daytona
Giving Agents Computers — Ivan Burazin, Daytona
We chat with Daytona's CEO about their insane 74% MoM Growth, 850K Daily Runs, Bare Metal Sandboxes, RL Evals, and the New Agent Cloud
-
vLLM V0에서 V1로: 강화학습에서 수정보다 정확성을 먼저
vLLM V0 to V1: Correctness Before Corrections in RL
-
Ecom-RLVE: 전자상거래 대화형 에이전트용 적응형 검증 환경
Ecom-RLVE: Adaptive Verifiable Environments for E-Commerce Conversational Agents
-
TRL v1.0: 분야와 함께 성장하는 포스트-트레이닝 라이브러리
TRL v1.0: Post-Training Library Built to Move with the Field
-
DeepSeek V3에서 V3.2로: 아키텍처, 희소 주의, 강화학습 업데이트
From DeepSeek V3 to V3.2: Architecture, Sparse Attention, and RL Updates
Understanding How DeepSeek's Flagship Open-Weight Models Evolved
-
LLM 추론을 위한 강화학습의 현황
The State of Reinforcement Learning for LLM Reasoning
Understanding GRPO and New Insights from Reasoning Model Papers