#llm-evaluation

총 7건 · 1/1 페이지

전체 24시간 7일 30일

전체 🇰🇷 한국어 본문

전체 High(60+) ⭐ Must-read(75+)

최신순 점수순

Eugene Yan · 2025-11-23 제목번역

세 가지 간단한 단계로 제품 평가하기

Product Evals in Three Simple Steps

Label some data, align LLM-evaluators, and run the eval harness with each change.

#machine-learning #ai-testing #llm-evaluation #data-labeling #product-evaluation #eval-harness
Eugene Yan · 2025-06-22 제목번역

긴 맥락 질의응답 시스템 평가

Evaluating Long-Context Question & Answer Systems

Evaluation metrics, how to build eval datasets, eval methodology, and a review of several benchmarks.

#llm-evaluation #long-context #question-answering #evaluation-metrics #benchmark #eval-dataset
Eugene Yan · 2025-04-20 제목번역

LLM-as-Judge는 제품을 구하지 못합니다—프로세스 개선이 핵심입니다

An LLM-as-Judge Won't Save The Product—Fixing Your Process Will

Applying the scientific method, building via eval-driven development, and monitoring AI output.

#llm-evaluation #eval-driven-development #ai-product-development #process-improvement #ai-monitoring #quality-assurance
Eugene Yan · 2024-10-27 제목번역

AlignEval: 평가를 쉽고 재미있으며 자동화되게 만드는 앱 구축하기

AlignEval: Building an App to Make Evals Easy, Fun, and Automated

Look at and label your data, build and evaluate your LLM-evaluator, and optimize it against your labels.

#ai-tools #model-optimization #llm-evaluation #data-labeling #eval-automation
Eugene Yan · 2024-09-22 제목번역

Weights & Biases LLM 평가기 해커톤 - 해커톤 심사위원

Weights & Biases LLM-Evaluator Hackathon - Hackathon Judge

Being a human judge at the Weights & Biases LLM-as-a-Judge Hackathon

#machine-learning #ai-evaluation #llm-evaluation #hackathon #llm-judge #weights-biases
Eugene Yan · 2024-08-18 제목번역

LLM 평가자의 효율성 평가 (LLM-as-Judge)

Evaluating the Effectiveness of LLM-Evaluators (aka LLM-as-Judge)

Use cases, techniques, alignment, finetuning, and critiques against LLM-evaluators.

#language-models #llm-as-judge #llm-evaluation #fine-tuning #model-alignment #evaluation-techniques
Eugene Yan · 2024-03-31 제목번역

작동하고 작동하지 않는 작업별 LLM 평가

Task-Specific LLM Evals that Do & Don't Work

Evals for classification, summarization, translation, copyright regurgitation, and toxicity.

#llm-evaluation #task-specific-evals #classification #summarization #translation #toxicity-detection

#llm-evaluation

세 가지 간단한 단계로 제품 평가하기

긴 맥락 질의응답 시스템 평가

LLM-as-Judge는 제품을 구하지 못합니다—프로세스 개선이 핵심입니다

AlignEval: 평가를 쉽고 재미있으며 자동화되게 만드는 앱 구축하기

Weights & Biases LLM 평가기 해커톤 - 해커톤 심사위원

LLM 평가자의 효율성 평가 (LLM-as-Judge)

작동하고 작동하지 않는 작업별 LLM 평가