-
튜링상 수상자 리처드 서튼이 말하는 순수 생성형 AI의 한계: 진정한 과학은 불가능
Turing Award winner Richard Sutton says pure generative AI can't do real science
<p><img alt="" class="attachment-full size-full wp-post-image" height="720" src="https://the-decoder.com/wp-content/uploads/2026/06/richard_sutton_screenshot.png" style="height: au…
-
에이전트에게 컴퓨터를 제공하다 — Ivan Burazin, Daytona
Giving Agents Computers — Ivan Burazin, Daytona
We chat with Daytona's CEO about their insane 74% MoM Growth, 850K Daily Runs, Bare Metal Sandboxes, RL Evals, and the New Agent Cloud
-
vLLM V0에서 V1로: 강화학습에서 수정보다 정확성을 먼저
vLLM V0 to V1: Correctness Before Corrections in RL
-
Ecom-RLVE: 전자상거래 대화형 에이전트용 적응형 검증 환경
Ecom-RLVE: Adaptive Verifiable Environments for E-Commerce Conversational Agents
-
TRL v1.0: 분야와 함께 성장하는 포스트-트레이닝 라이브러리
TRL v1.0: Post-Training Library Built to Move with the Field
-
DeepSeek V3에서 V3.2로: 아키텍처, 희소 주의, 강화학습 업데이트
From DeepSeek V3 to V3.2: Architecture, Sparse Attention, and RL Updates
Understanding How DeepSeek's Flagship Open-Weight Models Evolved
-
LLM 추론을 위한 강화학습의 현황
The State of Reinforcement Learning for LLM Reasoning
Understanding GRPO and New Insights from Reasoning Model Papers
-
[AI뉴스] OpenAI o3, o4-mini, Codex CLI • Buttondown
[AINews] OpenAI o3, o4-mini, and Codex CLI • Buttondown
<p><strong>10x compute on RL is all you need.</strong></p> <blockquote> <p>AI News for 4/15/2025-4/16/2025. We checked 9 subreddits, <a href="https://twitter.com/i/lists/1585430245…