LangChain Blog · 4일 전 · 원문 보기

https://www.langchain.com/blog/financial-ai-that-investigates-macro-trends-eu-economic-analysis-with-you-com-and-langchain

매크로 연구팀들은 정기적으로 특정 국가군 중 어느 국가가 이상적인 성과를 내고 있는지, 그리고 그 이유가 무엇인지 파악해야 합니다. 기저 데이터는 존재하지만 분산되어 있습니다. 단 하나의 GDP 수치도 다른 일정에 발표되고 다른 방법론을 사용하는 국가통계청 발표와 유로스타트 자료를 대조해야 할 수 있습니다. 원본 데이터에서 사용 가능한 출처가 명시된 브리핑까지 도달하는 데 분석 자체만큼 많은 시간이 소요될 수 있습니다. 이를 실제로 보여드리기 위해 우리는 에이전트를 구축하여 2025년 모든 27개 EU 회원국의 GDP 데이터에 대해 실행했습니다.

‍

아일랜드는 12.3% GDP 성장으로 호황처럼 보이는 단일 최대 이상치로 나타났습니다. 국가별 조사를 통해 미국 관세에 앞서 전면 배치된 제약 수출 급증으로 인한 것으로 파악되었으며, 산업 부문만 해도 인쇄값에 +6.55pp를 기여했습니다. 수정 GNI는 훨씬 더 낮은 수치를 보여주었습니다. 독일은 반대 이유로 플래그 처리되었습니다: 경기 침체가 아닌 자동차 노출과 건설 붕괴로 인한 구조적 수축이었습니다. 에이전트는 그 차이를 출처가 명시되고 인용된 형태로 45분 만에, API 호출 비용 $2.20으로 생성했습니다.

‍

이 발견은 이야기의 절반일 뿐입니다. 금융 서비스 업계에서 결론에 도달한 방식을 설명할 수 있는 능력은 결론 자체만큼 중요합니다. AI 에이전트는 여기서 격차를 만듭니다: 명시적 계측 없이는 에이전트가 실행 중 내린 결정이 실행이 완료되면 사라집니다. 이 아키텍처는 결정 로그를 보존합니다: 발행된 모든 쿼리, 수신된 모든 응답, 최종 보고서가 작성되기 전에 생성된 모든 중간 결과입니다. LangSmith는 에이전트 실행 시 전체 실행 추적을 캡처하므로 출력을 검토하는 모든 사람이 최종 보고서의 모든 데이터 포인트를 이를 생성한 출처까지 추적할 수 있습니다.

‍

프롬프트

2025년의 최신 이용 가능한 GDP 데이터를 사용하여 EU 경제 지역 내 각 국가를 분석하십시오. 비정상적인 속도로 증가 또는 감소하는 국가들을 강조하십시오. 이러한 변화를 야기하는 산업들을 구체적으로 명시하고 분석하며 각 국가 내에서 기여하는 거시경제 추세를 조사하십시오.

출력이 어떻게 보이는지

쿼리에는 두 가지 주요 질문이 있습니다: 어느 EU-27 국가가 비정상적인 성장 또는 수축을 보이고 있으며, 그 편차를 야기하는 구조적, 경기적 요인은 무엇인가입니다.

구조화된 브리핑을 받게 됩니다: GDP 궤적, 이상치 운전 요인, 이자율 민감도, 환율 노출, 소버린 리스크 신호, 부문 포지셔닝을 포함한 2차 영향입니다. 모든 단계가 명확하고 감시 가능합니다.

‍

보고서는 표준 형식을 따릅니다:

경영진 요약: 헤드라인 수치, 주요 패턴, 가장 중요한 발견
방법론 및 데이터 유의사항: 사용된 출처, 데이터 버전, 알려진 주의사항
지역 개요: 집계 GDP, 평균 성장률, 매크로 맥락
국가별 GDP 표: 성장률별 순위로 정렬된 모든 국가, 이상치 플래그, 평균값으로부터의 편차
다년간 성장 맥락: 3~5년 성장 궤적
이상치 분석, 고성장: 국가별 심층 분석
이상치 분석, 저성장/수축: 국가별 심층 분석
GDP 분해: 지출 측면 및 부문 측면 분석표
구조적 대 경기적 분석: 각 이상치의 분류
거시경제 테마 및 근본 원인: 교차 절단 요소
정책 맥락: 통화, 재정, EU 수준
위험 및 전향적 평가: 향후 1~2년에 대한 영향‍
출처: 모든 작업 문서의 통일된 순차 [[n]] 번호 매기기

‍

최종 출력 요약, LangSmith UI에서 볼 수 있습니다. 완전한 보고서는 GitHub 저장소에서 확인하세요.

‍

landscape-scanner 서브에이전트가 보고서를 작성하는 중

‍

주요 발견:

아일랜드의 12.3%는 국내 활동이 아닌 다국적 제약 생산량과 IP 효과로 인해 추진됩니다. 수정 GNI는 훨씬 더 낮은 수치를 보여줄 것입니다.
주요 낙후 국가들은 공통 주제를 공유합니다: 미국 관세 노출, 제조업에서의 중국 경쟁, 건설에 대한 고금리 체증 효과입니다.
스페인, 폴란드, 불가리아, 크로아티아는 실질임금 회복과 EU 기금 지급에서 뛰어난 성과를 거두었습니다.

‍완전한 보고서, 서브에이전트 작업 문서, 국가별 세부 분석, 산업 귀속, 거시경제 근본 원인, 그리고 모든 인용 출처를 보세요. GitHub 저장소에서 보기

‍

Deep Agents와 LangSmith가 가능하게 하는 것

Finance Research API는 데이터 검색, 추론, 합성을 처리합니다. 복잡한 연구 쿼리를 제공하면 공개 및 비공개 데이터에 기반한 답변과 인라인 인용을 반환합니다. Deep Agents와 LangSmith는 이를 중심으로 구축할 수 있는 엔지니어링 도구와 인프라를 제공합니다: 컨텍스트 엔지니어링, 서브에이전트 관리, 도구 실행, 관찰성, 프로덕션 배포입니다.

‍

컨텍스트 엔지니어링. 시스템 프롬프트, 서브에이전트, 스킬 및 파일 시스템 관리(Backends를 통해)를 통해 각 서브에이전트가 엄격히 필요한 컨텍스트만 받도록 보장합니다. 이는 반복 가능하고 신뢰할 수 있는 서브에이전트 동작을 설계할 수 있게 합니다.

서브에이전트 관리. Deep Agents에 기본으로 내장된 5개의 사전 정의된 서브에이전트와 1개의 범용 서브에이전트입니다. 일부는 한 번 실행되고, 다른 일부는 여러 번 확장됩니다. country-investigator는 비정상적인 국가당 하나의 인스턴스를 실행합니다. 한 번 정의하면 Deep Agents가 위임, 동시성, 실패 격리, 결과 집계를 처리합니다.

도구 실행. Finance Research API는 하나의 도구 호출입니다. MCP 서버, REST 엔드포인트, 내부 데이터 피드는 동일한 방식으로 연결되며, 서브에이전트별로 범위가 지정됩니다. 몇 줄의 코드로 특정 서브에이전트에 새 도구나 데이터 소스를 추가할 수 있습니다.

프로덕션 배포. LangSmith 배포는 스케일링을 처리하고, StoreBackend를 통한 지속적 저장소를 제공하며, 환경 관리를 수행합니다. 같은 에이전트가 로컬 개발과 프로덕션에서 변경 없이 실행됩니다.

‍관찰성. 모든 you_finance_research 호출, 내장 도구 호출(할 일 목록, 파일 읽기, 작업 문서 작성) 및 오케스트레이터 결정이 LangSmith에 캡처됩니다. 추적이 감사 추적입니다. CLI, MCP, JSON 내보내기 및 LangSmith UI를 통해 쉽게 접근할 수 있습니다.

‍

구현

Finance Research API 도구 정의

각 서브에이전트는 하나의 도구를 얻습니다: Finance Research API입니다. 이 API 자체는 에이전트입니다: 다단계 연구를 실행하고, 구조화된 공개 데이터(World Bank, IMF, OECD, Eurostat, FRED)와 라이선스된 비공개 데이터를 수집하며, 병렬 분기에서 출처를 검증하고, [[n]] 출처 태그와 함께 인용된 답변을 반환합니다. 이를 LangChain 도구로 래핑하면 Deep Agents에서 호출 가능합니다.

@tool(parse_docstring=True)
async def you_finance_research(
    input: str,
    research_effort: Literal["deep", "exhaustive"] = "deep",
) -> str:
    """인용 출처와 함께 금융 및 거시경제 주제를 연구하십시오.

    Args:
        input: 연구 질문(최대 40,000자).
        research_effort: 연구가 얼마나 철저해야 하는지.
    """
    body = {"input": input, "research_effort": research_effort}
    headers = {"Content-Type": "application/json", "X-API-Key": os.environ["YDC_API_KEY"]}

    async with httpx.AsyncClient(timeout=HTTP_API_TIMEOUT) as client:
        response = await client.post(HTTP_ENDPOINT, headers=headers, json=body)
        data = response.json()
        
    output = data.get("output", {})
    content = output.get("content", "")
    sources = output.get("sources", [])
    result = content
        
    if sources:
        result += "\n\n### Sources\n"
        for i, src in enumerate(sources, 1):
            title = src.get("title", "Untitled")
            url = src.get("url", "")
            result += f"[[{i}]] {title}: {url}\n"

    return result

‍

도구는 연구 질문을 노력 수준과 함께 보내고, 콘텐츠 필드(인라인 [[n]] 인용 태그 포함)와 출처 배열을 꺼내며, 에이전트가 최종 보고서로 전달하는 형식으로 이를 추가합니다. read=None 타임아웃은 API가 복잡한 쿼리에서 몇 분이 걸릴 수 있기 때문에 의도적입니다. 참고 구현은 또한 일시적 연결 실패에 대해 지수 백오프로 재시도합니다.

‍

또한 직접 HTTP 대신 MCP를 통해 도구를 로드할 수 있습니다. You.com은 langchain-mcp-adapters와 함께 작동하는 https://api.you.com/mcp?tools=you-finance에서 호스팅된 MCP 서버를 노출합니다.

API의 예산 모델 이해

Finance Research API는 호출당 유한한 계산 및 검색 예산을 가집니다. 단일 쿼리에서 요청하는 모든 것에 걸쳐 그 예산을 분할합니다:

초점 맞춘 쿼리(하나의 엔티티, 하나의 분석 질문)는 전체 예산을 얻고 풍부한 정량적 답변을 반환합니다
과부하 쿼리(많은 엔티티, 많은 분석 차원)는 예산을 분할하고 희박한 정성적 전용 답변을 반환합니다
데이터 검색 쿼리(예: "모든 27개 EU 국가의 GDP 성장")는 API가 올바른 데이터베이스 엔드포인트를 찾으면 엔티티당 저렴하므로 많은 국가를 일괄 처리하는 것이 잘 작동합니다

‍

이것이 에이전트가 모든 것을 하나의 호출로 일괄 처리하는 대신 초점 맞춘 쿼리를 발행하는 이유입니다. 각 초점 맞춘 호출은 하나의 분석 작업을 처리하고 결과 품질만큼 추적성(따라서 규정 준수)만큼 중요한 개별 귀속 가능한 결과를 생성합니다.

작동하는 쿼리 형태

이 예산 모델에서는 3가지 쿼리 형태가 안정적으로 작동합니다. 각각은 서브에이전트의 시스템 프롬프트에 인코딩됩니다. 전체 프롬프트는 아래 서브에이전트 정의에 나타납니다.

형태 A — 데이터 표: "[지표] for all [N] countries in [년도]"

단일 호출로 여러 국가에 걸친 구조화된 데이터입니다. 데이터 검색은 엔티티당 저렴하므로 모든 27개 EU 회원국을 일괄 처리하는 것이 잘 작동합니다:

"2020년부터 2025년까지 각 연도에 대해 모든 27개 EU 회원국의 실질 GDP 성장률(연간 백분율 변화, 체인 연결 볼륨)을 제공하십시오."

"2025년의 모든 27개 EU 회원국의 GDP 대비 경상계정 잔액을 제공하십시오."

‍

형태 A 호출은 규정 준수 목적상 주요 소스 계층입니다. Finance Research API가 Eurostat 또는 IMF 데이터베이스에서 GDP 수치를 반환할 때, 해당 출처 URL이 응답에 포함되고 작업 문서로 전달됩니다. MiFID II 기록 검토 또는 EU AI Act 감시가 요구하는 청구 체인은 여기서 시작됩니다.

‍

형태 B — 국가별 정성적 맥락: "여기 Eurostat의 수치가 있습니다. 이들을 설명하는 것은 무엇입니까?"

수치 뒤의 이야기입니다. 에이전트는 이미 형태 A에서 가진 Eurostat 데이터를 공급하고 잘 인덱싱된 출처에서 인과 설명을 요청합니다:

"아일랜드의 산업(B-E) GVA는 2025년에 29.1% 성장했고 GFCF는 GDP 성장에 +6.32pp를 기여했습니다. 이를 설명하는 것은 무엇입니까? 미국 관세 앞의 제약 수출 전면 배치가 있었습니까?"

"독일의 제조업 GVA는 -0.8% 감소했고 건설은 2025년에 -2.9% 감소했습니다. 이를 구체적으로 설명하는 것은 무엇입니까? 초점을 맞추십시오: 2019년 대비 자동차 생산 수준, VW 그룹 재구조화 발표."

‍

형태 C — 메커니즘 비교: "[메커니즘]을 [밀접하게 관련된 2-3개 국가]에 걸쳐 비교하십시오"

‍

공유 메커니즘이 2-3개 관련 국가에 걸쳐 어떻게 다르게 작동했는지:

"ECB 금리 인상이 2022-2023년에 스웨덴과 덴마크에 변동금리 모기지 시장을 통해 어떻게 영향을 미쳤습니까? 프랑스의 고정금리 시장과 비교하십시오."

각 서브에이전트의 시스템 프롬프트는 또한 피해야 할 것을 지정합니다: 단일 분석 쿼리로 4개 이상의 국가를 일괄 처리하지 마십시오, 데이터 검색을 한 호출로 해석과 결합하지 마십시오, deep이 실패하면 exhaustive로 에스컬레이션하지 마십시오. 범위를 좁히거나 대신 다시 표현하십시오.

연구 서브에이전트 정의

서브에이전트는 오케스트레이터에서 도구를 상속하지 않습니다. 각각은 우리가 선택한 LLM, 특정 작업, 그리고 배타적으로 Finance Research API 도구로 명시적으로 구성됩니다. 서브에이전트를 통해 주요 작업을 더 작은 작업 단위로 범위를 좁히면 컨텍스트 팽창을 줄이고, 예측 가능성을 개선하며, 전체 비용과 속도를 최적화합니다.

landscape_scanner_subagent = {
    "name": "landscape-scanner",
    "description": "형태 A 쿼리를 통해 모든 EU 회원국을 위한 구조화된 거시경제 데이터 표를 검색하십시오.",
    "system_prompt": """당신은 거시경제 데이터 전문가입니다...

    `deep` 노력으로 2-4개의 형태 A 쿼리를 실행하여 완전한 데이터 표를 구축하십시오.
    모든 결과를 /workpapers/landscape_scan.md로 작성하십시오.
    모든 인용이 보존된 구조화된 마크다운 표로.""",
    "tools": [you_finance_research],
    "model": "fireworks:accounts/fireworks/models/minimax-m2p5", # 서브에이전트는 오케스트레이터보다 다른 모델을 사용할 수 있습니다
}

anomaly_analyst_subagent = {
    "name": "anomaly-analyst",
    "description": "landscape 데이터를 분석하여 지역 평균을 계산하고, 비정상 국가를 플래그하며, 조사 대상을 추천하십시오. 순수 통계 분석; Finance Research API 호출 없음.",
    "system_prompt": """당신은 정량 분석가입니다...

    /workpapers/landscape_scan.md를 읽으십시오. 가중치 없는 평균을 계산하십시오.
    >=2.0 백분포인트 편차 국가를 플래그하십시오. 메커니즘별로 그룹화하십시오.
    전체 분석을 /workpapers/anomaly_analysis.md에 작성하십시오.
    끝에 fenced JSON 블록이 포함된 investigation_targets.""",
    "tools": [],      # 파일 시스템만 사용(미들웨어가 제공)
    "model": "fireworks:accounts/fireworks/models/minimax-m2p5",
}

‍

나머지 서브에이전트(expenditure-decomposer, sector-decomposer, country-investigator)는 각각 동일한 구조를 따릅니다: 초점 맞춘 시스템 프롬프트, tools=[you_finance_research], 그리고 전용 workpaper 경로입니다. country-investigator는 anomaly-analyst가 식별한 비정상 국가당 한 번 확장됩니다. prompts.py →에서 전체 서브에이전트 정의를 참조하십시오.

오케스트레이터 에이전트 생성

서브에이전트가 정의되면, 오케스트레이터는 create_deep_agent()로 조립됩니다. 오케스트레이터의 시스템 프롬프트에는 워크플로우 조정 논리와 분석 프레임워크가 있습니다. 쿼리 구성 지식은 서브에이전트 프롬프트에 있습니다.

from deepagents import create_deep_agent
from deepagents.backends import CompositeBackend, StateBackend
from deepagents.backends.filesystem import FilesystemBackend
from langgraph.checkpoint.memory import MemorySaver

backend = CompositeBackend(
    default=StateBackend(),
    routes={"/": FilesystemBackend(root_dir=reports_dir, virtual_mode=True)},
)

agent = create_deep_agent(
    model="fireworks:accounts/fireworks/models/minimax-m2p7", # 모든 LangChain 호환 모델 문자열을 여기에 바꾸십시오
    tools=[],
    system_prompt=system_prompt,
    subagents=[
        landscape_scanner_subagent,
        anomaly_analyst_subagent,
        expenditure_decomposer_subagent,
        sector_decomposer_subagent,
        country_investigator_subagent,
    ],
    backend=backend,
    checkpointer=MemorySaver(),
)

주목할 두 가지:

‍

CompositeBackendFilesystemBackend로 디스크에 이동합니다. 서브에이전트는 workpapers와 최종 보고서를 그곳에 작성하여 해당 콘텐츠를 메시지 기록 밖에 유지하므로 메시지 기록이 복잡해질 것입니다. 오케스트레이터는 합성 중에 workpapers를 다시 읽고, 최종 보고서는 /final_report.md에 도착합니다.

‍

subagents는 오케스트레이터에 작업 도구를 제공합니다. task(subagent_type="landscape-scanner", description="...")를 호출하여 서브에이전트를 디스패치합니다. 서브에이전트를 병렬로 실행하려면 오케스트레이터는 단일 메시지에서 여러 작업 호출을 내보내고 Deep Agents가 이들을 동시에 실행합니다.

다층 워크플로우

오케스트레이터는 write_todos를 호출하여 각 실행을 시작하여 체크리스트로 연구 계획을 설정하고, 시스템 프롬프트에만 의존하지 않고 실행 전체에 걸쳐 추적하는 명시적 아티팩트입니다.

1. [ ] 계층 1: landscape-scanner 디스패치 for 데이터 표
2. [ ] 계층 2: anomaly-analyst를 디스패치하여 이상치 플래그
3. [ ] 계층 3a: expenditure-decomposer 및 sector-decomposer를 병렬로 확장
4. [ ] 계층 3b: 비정상 국가별 country-investigator를 확장
5. [ ] 교차 참조: workpapers를 모순 확인
6. [ ] 합성: 최종 보고서 작성

‍

각 계층의 결과는 다음을 공급합니다:

‍

계층 1: Landscape Scan. landscape-scanner는 2-4개의 형태 A 호출을 Finance Research API에 발행하여 모든 27개 EU 회원국의 데이터 표를 구축합니다. 결과는 /workpapers/landscape_scan.md로 이동합니다.

‍

계층 2: 이상치 감지. anomaly-analyst는 landscape workpaper를 읽고, 지역 평균을 계산하고, 2+ 백분포인트 편차 국가를 플래그하고, 끝에 fenced JSON 블록이 있는 완전한 분석을 /workpapers/anomaly_analysis.md로 작성합니다. 이 단계는 Finance Research API 호출을 포함하지 않습니다; 순수 분석 및 계산입니다. 오케스트레이터는 workpaper를 읽고 investigation_targets를 JSON 블록에서 파싱하여 어느 국가가 심층 후속 조치를 받을지 결정합니다.

‍

계층 3a: 정량 분해. expenditure-decomposer 및 sector-decomposer는 병렬로 실행됩니다: 단일 메시지에서 두 개의 작업 호출. 각각은 하나의 형태 A 쿼리를 발행하고 workpaper를 작성합니다. 이는 에이전트에 전체 분석을 위한 수치 백본을 제공합니다.

‍

계층 3b: 국가 조사 확장. 오케스트레이터는 분해 workpapers를 읽고 가장 흥미로운 이상치를 선택하며, 국가당 하나의 country-investigator를 모두 병렬로 디스패치합니다. 각 조사자는 작업 설명에서 국가명, 핵심 데이터 포인트, workpaper 경로를 얻습니다. 각각은 독립적으로 형태 B 쿼리를 실행하고 고유한 파일(/workpapers/country_ireland.md, /workpapers/country_germany.md 등)에 작성합니다.

‍

교차 참조. 오케스트레이터는 모든 workpaper를 읽고 모순을 확인합니다: 아일랜드 GDP 수치가 landscape 스캔, 지출 분해, 국가 조사 전체에 걸쳐 일치합니까? 일치하지 않으면, 목표 지정 검증 쿼리로 범용 서브에이전트를 디스패치합니다.

‍

합성. 오케스트레이터는 분석 프레임워크(지출 분해, 구조적 대 경기적 분류, 정책 채널 분석)를 적용하고, 각 이상치를 분류하며, 거시 테마를 식별하고, 통합 [[n]] 인용 번호 매기기로 최종 보고서를 /final_report.md에 작성합니다.

에이전트가 어떻게 실행되는지

전체 실행에는 45분과 약 20개의 API 호출이 걸립니다. 에이전트는 형태 A 쿼리가 잘 일괄 처리되는 계층 1 및 3a에서 저렴하게 완전한 정량 커버리지를 구축하고, 계층 2에서 흥미로운 이야기를 식별하며, 계층 3b에서 그 이야기에 예산을 집중시킵니다. 모든 국가는 최종 보고서에서 확정 수치를 얻습니다; 실제 이상치만 심층 처리를 받습니다.

에이전트 실행

import asyncio
from finance_research.agent import run_finance_research

report = asyncio.run(run_finance_research(
    query="2025년의 최신 이용 가능한 GDP 데이터를 사용하여 EU 경제 지역 내 "
          "각 국가를 분석하십시오. 비정상적인 속도로 증가 또는 "
          "감소하는 국가들을 강조하십시오.",
    preset="gdp",
))

이 에이전트에 대한 관찰성이 중요한 이유

전체 실행에는 대략 20개의 Finance Research API 호출과 수십 개의 오케스트레이터 결정이 포함됩니다: 어느 형태 A 쿼리를 발행할지, 어느 국가가 형태 B 후속 조치를 받을지, 아일랜드의 GDP 인쇄가 국내 활동 또는 MNC 왜곡을 반영하는지 여부입니다.

추적은 실행을 생존하는 기록입니다. 최종 보고서의 모든 청구는 그것을 생성한 특정 Finance Research API 호출까지 역추적되고, 거기에서 기본 출처 URL까지 역추적됩니다. 세 가지 규제 프레임워크가 FSI 배포에 대해 이를 협상 불가능하게 만듭니다:

MiFID II: 기록 의무는 기업이 AI 지원 연구 입력을 포함한 투자 권고의 기초를 문서화하도록 요구합니다
DORA: 제3자 ICT 감독은 각 공급업체가 무엇을 반환했는지, 무엇을 입력했는지, 어떤 신뢰도로 지속적인 모니터링을 요구합니다; 사고 보고 창은 빠른 근본 원인 접근을 요구합니다
EU AI Act (Article 12): 고위험 AI 시스템은 사후 검토에 충분한 자동 이벤트 로그를 유지해야 합니다

‍

you_finance_research 호출을 필터링하고 평가를 위한 데이터 집합을 생성하십시오

‍

LangSmith가 캡처하는 것

추적은 계측 코드를 작성하지 않아도 자동으로 구축됩니다. 모든 실행은 중첩된 추적 트리를 생성합니다: 오케스트레이터 → 작업 디스패치 → 서브에이전트 → you_finance_research 호출 → 응답입니다. 각 노드에서 LangSmith는 입력/출력 콘텐츠, 토큰 수(입력, 출력, 캐시 읽기, 캐시 생성), 지연 시간, 비용을 기록합니다. LLM 호출의 경우 전체 프롬프트, 완료, 모델 매개변수가 포함됩니다. 도구 호출의 경우 인수와 반환 값입니다.

‍

실질적인 결과: 서브에이전트의 you_finance_research 호출을 클릭하여 정확한 쿼리, 노력 수준, 전체 인용 응답, 출처 URL을 볼 수 있습니다. 그런 다음 한 수준 위로 클릭하여 서브에이전트가 그 응답을 workpaper에서 어떻게 사용했는지 볼 수 있습니다. 최종 보고서의 모든 청구는 그것을 생성한 특정 API 호출까지 역추적되고, 거기에서 기본 출처 URL까지 역추적됩니다.

‍

LangSmith의 대시보드는 단일 추적 내에서가 아니라 실행에 걸쳐 이 보기를 집계합니다. 기본적으로 모든 프로젝트는 추적 수, 지연 시간 백분위수(p50/p90/p99), 오류율, 총 비용, 토큰 분해, 이름별 도구 호출 빈도에 대한 차트를 얻습니다. 위에 사용자 정의 대시보드를 구축할 수 있습니다. 예를 들어, 시간 경과에 따른 계층 3b 국가 조사의 비용을 추적하거나, 서브에이전트별로 그룹화된 you_finance_research 호출의 오류율입니다.

전형적인 LangSmith 추적이 어떻게 보이는지

추적이 보여주는 것

모든 실행의 첫 번째는 오케스트레이터의 write_todos 계획, 서브에이전트가 발생하기 전에 설정된 연구 전략입니다.

할 일 목록 도구. 계획에 필수적입니다.

‍

계층 1은 landscape-scanner를 디스패치합니다. 작업 노드를 클릭하면 서브에이전트의 형태 A 쿼리, 반환된 데이터 표, /workpapers/landscape_scan.md로 결과를 커밋한 write_file 호출을 볼 수 있습니다.

‍

그런 다음 오케스트레이터는 anomaly-analyst를 디스패치합니다. 그 추적은 landscape workpaper을 로드하는 read_file, 통계 계산, 오케스트레이터가 파싱하는 JSON 블록이 있는 분석을 저장하는 write_file을 보여줍니다.

‍

계층 3a는 expenditure-decomposer 및 sector-decomposer가 동시에 실행되고 있음을 보여줍니다. 각각 자체 Finance Research API 호출이 있습니다. 계층 3b는 country-investigator 확장을 보여줍니다: 병렬로 여러 작업 노드, 국가당 하나씩, 각각 자체 형태 B 쿼리 및 workpaper 쓰기가 있습니다.

‍

교차 참조 단계는 오케스트레이터가 모든 workpaper을 읽고, 수치를 비교하고, 검증 쿼리를 발생할지 결정하는 것을 보여줍니다. 합성은 /final_report.md로의 최종 write_file을 보여줍니다.

/workpapers/에 도착하는 것

모든 실행은 14개 파일을 생성합니다:

‍landscape_scan.md: 모든 27개 회원국의 GDP 표(인라인 인용 및 오케스트레이터가 조사 대상을 선택하기 위해 파싱하는 JSON 블록이 있음).
anomaly_analysis.md: 이상치 분류, 구조적 대 경기적 플래그, 계층 3으로 디스패치된 국가의 순위 목록.
expenditure_decomposition.md 및 sector_decomposition.md: 병렬 계층 3a workpapers, 각각 전체 분석 표를 포함.
‍country_[name] (비정상 국가당 하나): GDP 궤적, 지출 및 부문 분해, 명명된 메커니즘이 있는 주요 발견, 전향적 위험 평가, 전체에 걸쳐 [[n]] 인용.

GitHub 저장소 →에서 전체 예제 workpaper을 참조하십시오.

‍

파일 시스템 접근은 Deep Agents의 Backends에 의해 처리됩니다: 플러그형 파일 시스템 인터페이스는 구성하는 모든 스토리지로 지원되는 read_file, write_file, edit_file, ls, glob, grep을 제공합니다. 로컬 개발의 경우: FilesystemBackend(root_dir="."). 프로덕션의 경우: StoreBackend는 LangGraph의 저장소 인터페이스를 통해 Redis 또는 Postgres로 라우팅됩니다.

국가 조사 보고서를 파일에 작성

‍

에이전트 평가

연구팀은 정기적인 일정에 따라 이 에이전트를 실행합니다: 주간 GDP 업데이트, 월간 부문 회전, 할당 회의 전 임시 심층 분석. 수십 개의 실행에 걸쳐 패턴 수준의 질문이 나타나며, 단일 추적이 해결하지 않습니다: Finance Research API가 특정 국가에서 더 얇은 결과를 반환하고 있습니까? 2.0 pp 이상치 임계값이 휘발성 분기에 너무 많은 국가를 플래그합니까? Sonnet이 절반의 비용으로 오케스트레이터에 대해 Opus만큼 잘 수행할까요?

‍

LangSmith의 평가 프레임워크가 이를 위해 구축되었으며, 5가지 방식으로 적용됩니다:

오프라인 실험. 참고 출력이 있는 테스트 쿼리의 데이터 집합을 구축합니다. 이는 팀이 검증한 과거 보고서를 포함할 수 있습니다. 데이터 집합에 대해 에이전트를 실행하고, 평가자와 함께 점수를 매기고, 집계 결과를 얻습니다. 그런 다음 오케스트레이터의 핵심 LLM을 바꾸거나 2.0 pp에서 1.5 pp로 이상치 임계값을 변경하고, 동일한 데이터 집합에 대해 다시 실행합니다. LangSmith의 비교 보기는 두 실험을 나란히 보여주고, 회귀는 빨간색, 개선 사항은 녹색으로 강조합니다. 어느 행을 드릴하여 두 실행의 추적을 나란히 볼 수 있습니다.

사용자 정의 평가자. 이 워크플로우에 맞게 조정된 점수 함수를 작성할 수 있습니다. 예를 들어 최종 보고서의 모든 [[n]] 인용이 유효한 출처 URL에 매핑되는지 확인하거나, 유한한 결과를 반환한 you_finance_research 호출 수를 세기입니다. 이는 실험의 일부로 실행되고 시간 경과에 따라 추적할 수 있는 점수를 생성합니다.

온라인 평가. 프로덕션 트래픽에 평가자를 첨부합니다. LangSmith는 자동으로 라이브 실행의 샘플을 점수를 매길 수 있습니다. 예를 들어 보고서에 필수 섹션이 포함되어 있는지, 인용 번호가 순차적인지 확인하거나, 서브에이전트가 속도 제한에 도달한 실행을 플래그합니다. 평가 기준과 일치하는 실행은 조사를 위해 확장된 보존을 받습니다.

주석 큐. 위험 위원회에 가는 보고서 또는 에이전트의 구조적 대 경기적 분류가 경계로 보이는 출력과 같이 인간 검토가 필요한 경우, 실행은 주석 큐로 라우팅할 수 있습니다. 검토자는 루브릭에 대해 점수를 매기고, 수정을 추가하고, 그 수정은 향후 실행을 위한 평가 데이터 집합으로 다시 공급됩니다. 쌍별 큐를 사용하면 검토자가 동일한 보고서의 두 버전을 나란히 비교할 수 있습니다.

도구 수준 분석. 프로젝트 전체에서 도구 이름별로 실행을 필터링하여 you_finance_research 성능을 집계합니다: 유용한 결과 대 속도 제한 대 유한한 응답을 반환하는 빈도, 쿼리 형태별 평균 지연 시간, 호출당 비용입니다. 이는 형태 B 쿼리가 북유럽 국가에서 지속적으로 희박하게 반환되거나 한 서브에이전트가 불균형적으로 API 예산의 몫을 소진하는 방식을 알아차리는 방식입니다.

프로덕션의 경우 이들 지표에 대한 알람을 설정할 수 있습니다: 15분 창에서 오류율이 5%를 초과하면 플래그, 평균 지연 시간이 스파이크하거나 실행 당 비용이 임계값을 초과하면 플래그합니다. 알람은 Slack, PagerDuty 또는 사용자 정의 웹훅으로 이동합니다.

‍

시작하기

# 참고 템플릿 복제
git clone https://github.com/youdotcom-oss/langchain-deepagents-finance-research

Finance Research API는 langchain-youdotcom 패키지를 통해, 또는 https://api.you.com/mcp에 호스팅된 MCP 서버로 이용 가능합니다(문서). API 키 받기 → 통합 문서 보기 →

# 당신의 You.com API 키
export YDC_API_KEY=you.com_api_key

# 최소 하나의 모델 제공자
# 여러 다른 LLM 제공자에서 선택
export FIREWORKS_API_KEY="your_api_key_here"

# LangSmith 추적 활성화
export LANGCHAIN_API_KEY=langchain_api_key
export LANGSMITH_TRACING=true
export LANGSMITH_ENDPOINT=https://aws.api.smith.langchain.com
export LANGSMITH_PROJECT="My LangSmith project"

# 종속성 설치
pip install deepagents langchain-youdotcom langchain-mcp-adapters langchain-fireworks

# 에이전트를 실행하십시오.
python examples/eu_gdp_analysis.py

‍

이 예제를 실행하려면 LangSmith 계정(무료로 시작 →), You.com API 키(가입 at you.com →, 모든 신규 계정에는 $100의 무료 API 크레딧이 포함됨), Fireworks API 키가 필요합니다.

여러 다른 모델을 이미 LangChain 내에서 사용할 수 있습니다. 다른 모델로 바꾸려면, 해당 API 키를 설정하고, 해당 LangChain 패키지를 설치하고, 에이전트 및 서브에이전트 정의에서 모델 문자열을 업데이트하십시오.

이상치 감지 임계값을 구성하고 국가 범위를 사용자 정의하는 방법을 포함한 전체 문서는 통합 문서 →에 있습니다.

‍

이것은 누구를 위한 것인가

이 아키텍처는 금융 주제에 대해 구조화된 다단계 연구를 실행하는 모든 팀에 맞습니다: PE 기업의 거래 선별, 은행의 신용 보험 인수, 규정 준수 팀의 KYB 온보딩, 자산 운용사의 매크로 포지셔닝입니다. 여기 5개 서브에이전트 구조는 시작점입니다. 워크플로우에 맞게 트랙을 추가하거나 바꾸십시오: 규정 준수 집중적 실사를 위한 경영진 배경 확인, M&A 선별을 위한 IP 포트폴리오 분석, 주식 연구를 위한 수익 신호 집계입니다. 각 새 트랙은 초점 맞춘 시스템 프롬프트 및 동일한 you_finance_research 도구를 갖춘 하나의 추가 서브에이전트 dict입니다.

코드 없는 버전이 Fleet, LangChain의 UI 기반 에이전트 구축 및 관리 플랫폼에 곧 출시됩니다.

벤치마크 방법론 및 정확도 세부 정보는 Finance Research API 개요 →

구축할 준비가 되셨습니까? API 키 받기 → Finance Research API 문서 → GitHub의 참고 구현 →

‍

추가 자료

‍

Macro research desks need to know, on a regular basis, which countries in a given set are performing anomalously and why. The underlying data exists but it is fragmented. A single GDP figure might require reconciling a Eurostat release against a national statistics office publication that arrived on a different schedule and uses a different methodology. Getting from raw data to a usable, sourced briefing can be as time consuming as the analysis itself. To show this in practice, we built an agent and ran it against 2025 GDP data for all 27 EU member states.
‍
Ireland came back as the single largest outlier, with 12.3% GDP growth that looked like a boom. Per-country investigation identified it as a pharma-led export surge front-loaded ahead of US tariffs, with the industrial sector alone contributing +6.55pp to the print. Modified GNI showed a far more modest number. Germany was flagged for the opposite reason: structural contraction driven by automotive exposure and construction collapse, not a cyclical dip. The agent produced that distinction, sourced and cited, in 45 minutes costing $2.20 in API calls.
‍
The findings are only half the story. In financial services, the ability to explain how a conclusion was reached matters as much as the conclusion itself. AI agents create a gap here: without explicit instrumentation, the decisions an agent makes during a run are lost once the run completes. This architecture preserves the decision log: every query issued, every response received, and every intermediate result produced before the final report is written. LangSmith captures the complete execution trace as the agent runs, so anyone reviewing the output can follow any data point in the final report back to the source that produced it.
‍
The prompt
Using the latest available GDP data for 2025, analyze each country within the EU economic zone. Highlight those that are increasing or decreasing at an anomalous rate. Specify and break down which industries are causing these shifts and investigate macroeconomic trends within each country that are contributing.
What the output looks like
The query has two primary questions: which EU-27 countries are growing or contracting anomalously, and what structural and cyclical forces are driving those deviations?
You get a structured briefing: GDP trajectory, anomaly drivers, and second-order implications including rates sensitivity, FX exposure, sovereign risk signals, and sector positioning. Every step is visible and auditable.
‍
The report follows a standard format:
Executive Summary: Headline numbers, key patterns, most important finding
Methodology & Data Notes: Sources used, data vintage, known caveats
Regional Overview: Aggregate GDP, average growth rate, macro context
Country-by-Country GDP Table: All countries ranked by growth rate, anomaly flags, delta from mean
Multi-Year Growth Context: 3–5 year growth trajectory
Anomaly Analysis, High Growth: Per-country deep dives
Anomaly Analysis, Low Growth / Contraction: Per-country deep dives
GDP Decomposition: Expenditure-side and sector-side breakdown tables
Structural vs Cyclical Analysis: Classification of each anomaly
Macroeconomic Themes & Root Causes: Cross-cutting forces
Policy Context: Monetary, fiscal, EU-level
Risks & Forward-Looking Assessment: Implications for the next 1-2 years‍
Sources: Unified sequential [[n]] numbering from all workpapers
‍
Summary of final output, as seen in LangSmith UI. See complete report in GitHub repo.
‍
The landscape-scanner subagent writing its report
‍
Key findings:
Ireland's 12.3% is driven by multinational pharma output and IP effects, not domestic activity. Modified GNI would show a far more modest number.
Major laggards share a common thread: exposure to US tariffs, Chinese competition in manufacturing, and high-rate-lag drag on construction.
Spain, Poland, Bulgaria, and Croatia outperformed on real wage recovery and EU fund disbursements.
‍See the full report, subagent workpapers, country-by-country breakdown, industry attribution, macroeconomic root causes, and all cited sources. View in the GitHub repository
‍
What Deep Agents and LangSmith make possible here
The Finance Research API handles data retrieval, reasoning, and synthesis. Give it a complex research query and it returns an answer grounded in public and private data, with inline citations. Deep Agents and LangSmith provide the engineering tools and infrastructure to build around it: context engineering, subagent management, tool execution, observability, and production deployment.
‍
Context engineering. System prompts, subagents, Skills and file system management (via Backends) ensure that each subagent strictly receives only the context it needs. This allows for designing repeatable and reliable subagent behaviors.
Subagent management. Five predefined subagents and one general-purpose subagent built into Deep Agents by default. Some run once; others fan out in multiples. The country-investigator runs one instance per anomalous country. Define it once; Deep Agents handles delegation, concurrency, failure isolation, and result aggregation.
Tool execution. The Finance Research API is one tool call. MCP servers, REST endpoints, and internal data feeds plug in the same way, scoped per subagent. In a few lines of code, you can add new tools or data sources to a particular subagent.
Production deployment. LangSmith Deployment handles scaling, persistent storage via StoreBackend, and environment management. The same agent runs in local dev and in production without changes.
‍Observability. Every you_finance_research call, built-in tool call (todo list, file reads, workpaper write) and orchestrator decision is captured in LangSmith. The trace is the audit trail. It is easily accessible via CLI, MCP, as JSON export and the LangSmith UI.
‍
Implementation
Defining the Finance Research API tool
Each subagent gets one tool: the Finance Research API. This API is itself an agent: it runs multi-step research, ingests structured public data (World Bank, IMF, OECD, Eurostat, FRED) and licensed private data, verifies sources across parallel branches, and returns cited answers with [[n]] source tags. Wrapping it as a LangChain tool makes it callable by Deep Agents.
@tool(parse_docstring=True) async def you_finance_research( input: str, research_effort: Literal["deep", "exhaustive"] = "deep", ) -> str: """Research financial and macroeconomic topics with cited sources. Args: input: The research question (max 40,000 characters). research_effort: How thorough the research should be. """ body = {"input": input, "research_effort": research_effort} headers = {"Content-Type": "application/json", "X-API-Key": os.environ["YDC_API_KEY"]} async with httpx.AsyncClient(timeout=HTTP_API_TIMEOUT) as client: response = await client.post(HTTP_ENDPOINT, headers=headers, json=body) data = response.json() output = data.get("output", {}) content = output.get("content", "") sources = output.get("sources", []) result = content if sources: result += "\n\n### Sources\n" for i, src in enumerate(sources, 1): title = src.get("title", "Untitled") url = src.get("url", "") result += f"[[{i}]] {title}: {url}\n" return result
‍
The tool sends a research question with an effort level, pulls out the content field (with inline [[n]] citation tags) and the sources array, and appends them in a format the agent carries through to the final report. The read=None timeout is deliberate since the API can take several minutes on complex queries. The reference implementation also retries with exponential backoff on transient connection failures.
‍
You can also load the tool via MCP instead of direct HTTP. You.com exposes a hosted MCP server at https://api.you.com/mcp?tools=you-finance that works with langchain-mcp-adapters.
Understanding the API's budget model
The Finance Research API has a finite compute and retrieval budget per call. It splits that budget across everything you ask in a single query:
Focused queries (one entity, one analytical question) get the full budget and return rich, quantitative answers
Overloaded queries (many entities, many analytical dimensions) split the budget and return thin, qualitative-only answers
Data-retrieval queries (e.g., "GDP growth for all 27 EU countries") are cheap per entity once the API finds the right database endpoint, so batching many countries works well
‍
This is why the agent issues focused queries rather than batching everything into a single call. Each focused call handles one analytical job and produces a discrete, attributable result, which matters as much for traceability (and hence compliance) as it does for result quality.
Query shapes that work
Three query shapes work reliably with this budget model. Each is encoded in its subagent's system prompt. The full prompts appear in the subagent definitions below.
Shape A — Data Tables: "[Metric] for all [N] countries in [year(s)]"
Structured data across many countries in a single call. Data retrieval is cheap per entity, so batching all 27 EU member states works fine:
"Provide real GDP growth rates (annual percent change, chain-linked volumes) for all 27 EU member states for each year from 2020 to 2025." "Provide current account balances as a percentage of GDP for all 27 EU member states in 2025."
‍
Shape A calls are also your primary source layer for compliance purposes. When the Finance Research API returns GDP figures from Eurostat or IMF databases, those source URLs are included in the response and carried forward into the workpaper. The claim chain that a MiFID II records review or an EU AI Act audit requires starts here.
‍
Shape B — Per-Country Qualitative Context: "Here are the numbers from Eurostat. What explains them?"
The story behind the numbers. The agent feeds in the Eurostat data it already has from Shape A and asks for causal explanations from well-indexed sources:
"Ireland's Industry (B-E) GVA grew 29.1% in 2025 and GFCF contributed +6.32pp to GDP growth. What explains this? Was there front-loading of pharma exports ahead of US tariffs?" "Germany's manufacturing GVA fell -0.8% and construction fell -2.9% in 2025. What specific factors explain this? Focus on: automotive production levels vs 2019, VW Group restructuring announcements."
‍
Shape C — Mechanism Comparisons: "Compare [mechanism] across [2-3 closely related countries]"
‍
How a shared mechanism played out differently across 2-3 related countries:
"How did ECB rate hikes in 2022-2023 affect Sweden and Denmark through their variable-rate mortgage markets? Compare with France's fixed-rate market."
Each subagent's system prompt also specifies what to avoid: don't batch 4+ countries into a single analytical query, don't combine data retrieval with interpretation in one call, and don't escalate to exhaustive when deep fails. Narrow the scope or rephrase instead.
Defining the research subagents
Subagents don't inherit tools from the orchestrator. Each one is explicitly configured with an LLM of our choosing, a specific task, and exclusively the Finance Research API tool. By scoping the main task down to smaller units of work via subagents, we reduce context bloat, improve predictability and optimize overall cost and speed.
landscape_scanner_subagent = { "name": "landscape-scanner", "description": "Retrieve structured macroeconomic data tables for all EU member states via Shape A queries.", "system_prompt": """You are a macroeconomic data specialist... Run 2-4 Shape A queries at `deep` effort to build complete data tables. Write ALL results to /workpapers/landscape_scan.md as structured markdown tables with all citations preserved.""", "tools": [you_finance_research], "model": "fireworks:accounts/fireworks/models/minimax-m2p5", # subagents can use a different model than the orchestrator } anomaly_analyst_subagent = { "name": "anomaly-analyst", "description": "Analyze landscape data to compute regional mean, flag anomalous countries, and recommend investigation targets. Pure statistical analysis; no Finance Research API calls.", "system_prompt": """You are a quantitative analyst... Read /workpapers/landscape_scan.md. Compute the unweighted mean. Flag countries deviating by >=2.0 percentage points. Group by mechanism. Write your full analysis to /workpapers/anomaly_analysis.md, including a fenced JSON block at the end with investigation_targets.""", "tools": [], # Only uses filesystem (provided by middleware) "model": "fireworks:accounts/fireworks/models/minimax-m2p5", }
‍
The remaining subagents (expenditure-decomposer, sector-decomposer, country-investigator) each follow the same structure: a focused system prompt, tools=[you_finance_research], and a dedicated workpaper path. The country-investigator is fanned out once per anomalous country identified by anomaly-analyst. See the full subagent definitions in prompts.py →.
Creating the orchestrator agent
With the subagents defined, the orchestrator is assembled with create_deep_agent(). The orchestrator's system prompt has the workflow coordination logic and analytical frameworks. Query construction knowledge lives in the subagent prompts.
from deepagents import create_deep_agent from deepagents.backends import CompositeBackend, StateBackend from deepagents.backends.filesystem import FilesystemBackend from langgraph.checkpoint.memory import MemorySaver backend = CompositeBackend( default=StateBackend(), routes={"/": FilesystemBackend(root_dir=reports_dir, virtual_mode=True)}, ) agent = create_deep_agent( model="fireworks:accounts/fireworks/models/minimax-m2p7", # swap any LangChain-compatible model string here tools=[], system_prompt=system_prompt, subagents=[ landscape_scanner_subagent, anomaly_analyst_subagent, expenditure_decomposer_subagent, sector_decomposer_subagent, country_investigator_subagent, ], backend=backend, checkpointer=MemorySaver(), )
Two things to note:
‍
CompositeBackend routes agent-internal state to StateBackend (in-memory); file writes go to FilesystemBackend on disk. Subagents write workpapers and the final report there, keeping that content out of message history, which would get unwieldy. The orchestrator reads the workpapers back during synthesis; the final report lands at /final_report.md.
‍
subagents gives the orchestrator a task tool. It dispatches subagents by calling task(subagent_type="landscape-scanner", description="..."). To run subagents in parallel, the orchestrator emits multiple task calls in a single message and Deep Agents runs them concurrently.
The multi-layer workflow
The orchestrator starts each run by calling write_todos to lay out a research plan as a checklist, an explicit artifact it tracks against throughout the run rather than relying solely on the system prompt.
1. [ ] Layer 1: Dispatch landscape-scanner for data tables 2. [ ] Layer 2: Dispatch anomaly-analyst to flag outliers 3. [ ] Layer 3a: Fan out expenditure-decomposer and sector-decomposer in parallel 4. [ ] Layer 3b: Fan out country-investigator per anomalous country 5. [ ] Cross-reference: Check workpapers for contradictions 6. [ ] Synthesize: Write final report
‍
Each layer's results feed the next:
‍
Layer 1: Landscape Scan. landscape-scanner fires 2-4 Shape A calls to the Finance Research API, building data tables for all 27 EU member states. Results go to /workpapers/landscape_scan.md.
‍
Layer 2: Anomaly Detection. anomaly-analyst reads the landscape workpaper, computes the regional mean, flags countries deviating by 2+ percentage points, and writes a full analysis to /workpapers/anomaly_analysis.md with a fenced JSON block at the end. The step involves no Finance Research API calls; it is pure analysis and computation. The orchestrator reads the workpaper and parses investigation_targets from the JSON block to decide which countries get deep follow-ups.
‍
Layer 3a: Quantitative Decomposition. expenditure-decomposer and sector-decomposer run in parallel: two task calls in a single message. Each fires one Shape A query and writes its workpaper. This gives the agent the numerical backbone for the whole analysis.
‍
Layer 3b: Country Investigation Fan-Out. The orchestrator reads the decomposition workpapers, picks the most interesting anomalies, and dispatches one country-investigator per country, all in parallel. Each investigator gets the country name, its key data points, and a workpaper path in the task description. Each runs Shape B queries independently and writes to its own file (/workpapers/country_ireland.md, /workpapers/country_germany.md, etc.).
‍
Cross-Reference. The orchestrator reads every workpaper and checks for contradictions: does the Ireland GDP figure match across the landscape scan, the expenditure decomposition, and the country investigation? If not, it dispatches the general-purpose subagent with a targeted verification query.
‍
Synthesis. The orchestrator applies its analytical frameworks (expenditure decomposition, structural vs. cyclical classification, policy channel analysis), classifies each anomaly, identifies macro themes, and writes the final report to /final_report.md with unified [[n]] citation numbering.
How the agent runs
A full run takes 45 minutes and ~20 API calls. The agent builds complete quantitative coverage cheaply in Layers 1 and 3a, where Shape A queries batch well; identifies the interesting stories in Layer 2; and concentrates its budget on those in Layer 3b. Every country gets hard numbers in the final report; only the real anomalies get the deep treatment.
Running the agent
import asyncio from finance_research.agent import run_finance_research report = asyncio.run(run_finance_research( query="Using the latest available GDP data for 2025, analyze each country " "within the EU economic zone. Highlight those that are increasing or " "decreasing at an anomalous rate.", preset="gdp", ))
Why observability matters for this agent
A full run involves roughly 20 Finance Research API calls and dozens of orchestrator decisions: which Shape A queries to fire, which countries get Shape B follow-ups, whether Ireland's GDP print reflects domestic activity or MNC distortion.

The trace is the record that survives the run. Any claim in the final report traces backward to the specific Finance Research API call that produced it, and from there to the primary source URL. Three regulatory frameworks make this non-negotiable for FSI deployments:
MiFID II: records obligations require firms to document the basis for investment recommendations, including AI-assisted research inputs
DORA: third-party ICT oversight requires ongoing monitoring of what each vendor returned, on what input, and with what confidence; incident reporting windows require fast root-cause access
EU AI Act (Article 12): high-risk AI systems must maintain automatic event logs sufficient for post-hoc review
‍
Filter out every call to you_finance_research and create a data set for evaluations
‍
What LangSmith captures
The trace is built automatically without writing any instrumentation code. Every run produces a nested trace tree: orchestrator → task dispatch → subagent → you_finance_research call → response. At each node, LangSmith records input/output content, token counts (input, output, cache read, cache creation), latency, and cost. For the LLM calls, that includes the full prompt, completion, and model parameters. For tool calls, it's the arguments and return value.
‍
The practical upshot: you can click into any subagent's you_finance_research call and see the exact query, the effort level, the full cited response, and the source URLs. You can then click up one level to see how the subagent used that response in its workpaper. Any claim in the final report traces backward to the specific API call that produced it, and from there to the primary source URL.
‍
LangSmith's dashboards give you this view aggregated across runs, not just within a single trace. Out of the box, every project gets charts for trace count, latency percentiles (p50/p90/p99), error rates, total cost, token breakdown, and tool call frequency by name. You can build custom dashboards on top. For example, tracking the cost of Layer 3b country investigations over time, or the error rate of you_finance_research calls grouped by subagent.
What a typical LangSmith trace looks like
What the trace shows
The first thing in any run is the orchestrator's write_todos plan, the research strategy laid out before any subagent fires.
The todo list tool. Essential for planning.
‍
Layer 1 dispatches landscape-scanner. Click into the task node and you see the subagent's Shape A queries, the data tables that came back, and the write_file call that committed results to /workpapers/landscape_scan.md.
‍
Then, the orchestrator dispatches anomaly-analyst. Its trace shows the read_file loading the landscape workpaper, the statistical computation, and the write_file saving the analysis with a JSON block that the orchestrator parses for investigation targets.
‍
Layer 3a shows expenditure-decomposer and sector-decomposer running concurrently, each with their own Finance Research API calls. Layer 3b shows the country-investigator fan-out: multiple task nodes in parallel, one per country, each with its own Shape B queries and workpaper write.
‍
The cross-referencing step shows the orchestrator reading every workpaper, comparing figures, and deciding whether to fire verification queries. Synthesis shows the final write_file to /final_report.md.
What lands in /workpapers/
Every run produces 14 files:
‍landscape_scan.md: GDP tables for all 27 member states with inline citations and the JSON block the orchestrator parses to select investigation targets.
anomaly_analysis.md: Outlier classification, structural vs. cyclical flags, and the ranked list of countries dispatched to Layer 3.
expenditure_decomposition.md and sector_decomposition.md: Parallel Layer 3a workpapers, each with a full breakdown table.
‍country_[name] (one per anomalous country): GDP trajectory, expenditure and sector decomposition, principal finding with named mechanism, forward-looking risk assessment, and [[n]] citations throughout.
See a full example workpaper in the GitHub repository →.
‍
File system access is handled by Deep Agents' Backends: a pluggable filesystem interface that gives each agent read_file, write_file, edit_file, ls, glob, grep backed by whatever storage you configure. For local development: FilesystemBackend(root_dir="."). For production: StoreBackend routes to Redis or Postgres via LangGraph's store interface.
Writing a country investigation report to file
‍
Evaluating the agent
A research desk runs this agent on a recurring schedule: weekly GDP updates, monthly sector rotations, ad-hoc deep dives before allocation meetings. Over dozens of runs, pattern-level questions emerge that no single trace resolves: is the Finance Research API returning thinner results on certain countries? Does the 2.0 pp anomaly threshold flag too many countries in volatile quarters? Would Sonnet perform as well as Opus for the orchestrator at half the cost?
‍
LangSmith's evaluation framework is built for this, and applies in five ways:
Offline experiments. Build a dataset of test queries with reference outputs. This can include past reports the team has validated. Run the agent against the dataset, score with evaluators, and get aggregate results. Then swap the orchestrator's core LLM, or change the anomaly threshold from 2.0 pp to 1.5 pp, and run the same dataset again. LangSmith's comparison view shows the two experiments side by side, with regressions highlighted in red and improvements in green. You can drill into any row to see the traces from both runs next to each other.
Custom evaluators. You can write scoring functions tailored to this workflow. For example, checking whether every [[n]] citation in the final report maps to a valid source URL, or counting how many you_finance_research calls returned "insufficient" results. These run as part of the experiment and produce scores you can track over time.
Online evaluation. Attach evaluators to production traffic. LangSmith can automatically score a sample of live runs. For example, checking that the report includes required sections, that citation numbering is sequential, or flagging runs where a subagent hit a rate limit. Runs that match evaluation criteria get extended retention for investigation.
Annotation queues. When human review is necessary, like a report that's going to a risk committee, or an output where the agent's structural-vs-cyclical classification looks borderline, runs can be routed to an annotation queue. Reviewers score against a rubric, add corrections, and those corrections feed back into the evaluation dataset for future runs. Pairwise queues let reviewers compare two versions of the same report side by side.
Tool-level analytics. Filter runs across the project by tool name to aggregate you_finance_research performance: how often it returns useful results vs. rate limits vs. "insufficient" responses, average latency by query shape, and cost per call. This is how you'd notice that Shape B queries on Nordic countries consistently come back thin, or that one subagent is burning a disproportionate share of the API budget.
For production, you can set alerts on these metrics: flag if error rate exceeds 5% in a 15-minute window, if average latency spikes, or if per-run cost crosses a threshold. Alerts go to Slack, PagerDuty, or a custom webhook.
‍
Getting started
# Clone the reference template git clone https://github.com/youdotcom-oss/langchain-deepagents-finance-research
The Finance Research API is available via the langchain-youdotcom package, or as a hosted MCP server at https://api.you.com/mcp (docs). Get your API key → See the integration docs →
# Your You.com API key export YDC_API_KEY=you.com_api_key # At least one model provider # Choose from several other LLM providers export FIREWORKS_API_KEY="your_api_key_here" # Enable LangSmith traces export LANGCHAIN_API_KEY=langchain_api_key export LANGSMITH_TRACING=true export LANGSMITH_ENDPOINT=https://aws.api.smith.langchain.com export LANGSMITH_PROJECT="My LangSmith project" # Install dependencies pip install deepagents langchain-youdotcom langchain-mcp-adapters langchain-fireworks # Run the agent. python examples/eu_gdp_analysis.py
‍
To run this example, you’ll need a LangSmith account (start for free →), a You.com API key (sign up at you.com →, all new accounts come with $100 in free API credits), and an Fireworks API key.
You can use several other models already available inside LangChain. To swap in a different model, set its API key, install the corresponding LangChain package, and update the model string in the agent and subagent definitions.
Full documentation, including how to configure the anomaly detection threshold and customize the country scope, is in the integration docs →.
‍
Who this is for
This architecture fits any team running structured multi-step research on financial subjects: deal screening at PE firms, credit underwriting at banks, KYB onboarding at compliance teams, macro positioning at asset managers. The five-subagent structure here is a starting point. Add or swap tracks to fit your workflow: management background checks for compliance-heavy diligence, IP portfolio analysis for M&A screening, earnings signal aggregation for equity research. Each new track is one additional subagent dict with a focused system prompt and the same you_finance_research tool.

A no-code version is coming to Fleet, LangChain's UI-driven platform for building and managing agents.

For benchmark methodology and accuracy details, see the Finance Research API overview →

Ready to build? Get your API key → Finance Research API docs → Reference implementation on GitHub →
‍
Additional Resources
‍

원문 보기 https://www.langchain.com/blog/financial-ai-that-investigates-macro-trends-eu-economic-analysis-with-you-com-and-langchain