-
법률 에이전트를 위한 효율적인 검증기 설계
Designing Efficient Verifiers for Legal Agents
A Harvey and LangChain Labs study on making LLM verifiers cheaper and more reliable for legal-agent evaluation and post-training.
-
튜링상 수상자 리처드 서튼이 말하는 순수 생성형 AI의 한계: 진정한 과학은 불가능
Turing Award winner Richard Sutton says pure generative AI can't do real science
<p><img alt="" class="attachment-full size-full wp-post-image" height="720" src="https://the-decoder.com/wp-content/uploads/2026/06/richard_sutton_screenshot.png" style="height: au…
-
신뢰할 수 있는 제3자 평가를 위한 공유 플레이북
A shared playbook for trustworthy third party evaluations | OpenAI
OpenAI shares guidance on third-party AI evaluations, covering how to assess model capabilities, safeguards, and validity for frontier systems.
-
메모리 검색 개선: New Computer가 LangSmith로 50% 높은 회상률을 달성한 방법
Improving Memory Retrieval: How New Computer achieved 50% higher recall with LangSmith
New Computer used LangSmith to improve their memory retrieval system, achieving 50% higher recall by tracking regressions in comparison view and adjusting conversation prompts acco…
-
에이전트 관찰성: 프로덕션 LLM 에이전트 모니터링 및 평가 방법
Agent Observability: How to Monitor and Evaluate LLM Agents in Production
Production monitoring for LLM agents requires new observability tools. Learn how to trace, evaluate, and improve AI agents at scale.
-
Weights & Biases LLM 평가기 해커톤 - 해커톤 심사위원
Weights & Biases LLM-Evaluator Hackathon - Hackathon Judge
Being a human judge at the Weights & Biases LLM-as-a-Judge Hackathon
-
Claude의 생물정보학 연구 능력을 BioMysteryBench로 평가하기
Apr 29, 2026 Science Evaluating Claude’s bioinformatics research capabilities with BioMysteryBench