총 1건 · 1/1 페이지
Task-Specific LLM Evals that Do & Don't Work
Evals for classification, summarization, translation, copyright regurgitation, and toxicity.