LangChain Blog · 9시간 전 · 원문 보기

Deep Agents v0.6의 새로운 기능

New in Deep Agents v0.6

최신 Deep Agents 릴리스는 모델 계층, 에이전트 계층, 규모 측면, 그리고 시간 경과에 따른 성능을 중심으로 합니다. 이번 릴리스의 다섯 가지 요소가 기여합니다:

코드 인터프리터: 에이전트가 도구를 구성하고, 상태를 관리하고, 모델 컨텍스트에 도달하는 것을 제어할 수 있는 경량 런타임 — 전체 샌드박스의 오버헤드 없이.
하네스 프로필: 각 모델별 튜닝으로 Kimi, Qwen, DeepSeek와 같은 오픈 웨이트 모델을 포함하여 실행 중인 모든 모델에서 하네스가 최대의 성능을 얻도록 합니다.
스트리밍: 메시지, 도구 호출, 서브에이전트 및 사용자 정의 애플리케이션 이벤트에 대한 타입이 지정된 투영 — 원시 스트림 출력을 파싱하는 대신 애플리케이션이 필요한 것을 정확히 구독합니다.
DeltaChannel: 에이전트가 더 오래 실행되고 컨텍스트가 누적됨에 따라 효율적인 체크포인트 저장소를 제공하며, 에이전트를 재개 가능하고 관찰 가능하며 복원력 있게 만드는 내구적인 실행 보장을 희생하지 않습니다.
ContextHubBackend: LangSmith Context Hub로 지원되며, 에이전트 동작을 형성하는 기술, 정책 및 메모리를 버전 관리되는 협업 환경에 저장하므로 에이전트가 한 번의 실행에서 학습한 것이 다음 실행을 개선할 수 있습니다.

코드 인터프리터

Deep Agents에서 설치 가능한 코드 인터프리터를 출시하고 있으며, 이는 에이전트에게 데이터를 변환하고, 도구 호출을 조정하고, 중간 작업을 모델 컨텍스트 밖에 유지할 수 있는 프로그래밍 가능한 작업 공간을 제공합니다. 에이전트는 의도를 표현하기 위해 코드를 작성하고, 인메모리 런타임이 해당 코드를 실행하고 관련 결과를 반환합니다.

샌드박스는 환경에 작용하는 코드 우선 방식(명령 실행, 종속성 설치, 파일 편집 등)인 반면, 인터프리터는 에이전트 루프 내에서 작용하는 코드 우선 방식입니다: 도구 구성, 상태 보존, 모델에 반환할 정보 결정.

// Agent can write code like this:
const topics = ["retrieval", "memory", "evaluation"];

const reports = await Promise.all(
  topics.map((topic) =>
    tools.task({
      description:
	      `Research ${topic} in Deep Agents and return three concise findings.`,
      subagent_type: "general-purpose",
    }),
  ),
);

reports.join("\n\n");

이를 통해 우리가 특히 흥미로워하는 에이전트의 몇 가지 새로운 기능이 가능해집니다:

모델 불가지론적 PTC

표준 도구 호출 루프는 모든 단계에서 모델을 트래픽 컨트롤러로 만듭니다. 모델이 도구를 요청하고, 컨텍스트에서 전체 결과를 받고, 그 결과에 대해 추론하고 반복합니다. 중간 결과가 다음 입력을 계산하는 데만 필요한 경우에도 여러 모델 호출을 통해 체인을 만들어야 합니다.

프로그래밍 방식 도구 호출(PTC)은 해당 워크플로우를 변경합니다. 모델은 실행 런타임 내에서 도구를 호출하는 코드를 작성하므로 워크플로우는 개별 도구 호출마다 모델로의 왕복 없이 실행될 수 있습니다. 중간 결과는 인터프리터가 잡음이 많은 출력을 필터링하고, 데이터를 처리하고, 실패를 재시도하고, 관련 컨텍스트만 모델로 반환할 수 있는 런타임 상태에 유지될 수 있습니다.

const pages = await Promise.all(
  urls.map((url) => tools.fetchUrl({ url })),
);

const relevant = pages
  .filter((page) => page.includes("interpreter"))
  .slice(0, 3);

relevant.map((page) => page.slice(0, 500));

이 도구 호출 패턴은 토큰 소비를 줄이고, 회피 가능한 모델 왕복을 줄이며, 에이전트의 추론 단계를 더 작게 만듭니다.

Anthropic은 이 패턴을 자신의 모델 제품군에 대한 API 동작으로 추가하여 대중화하는 데 도움을 주었으나, 인터프리터를 사용하면 이제 오픈 소스 모델을 포함한 모든 모델과 함께 모든 에이전트에서 이를 달성할 수 있습니다.

재귀적 워크플로우

인터프리터는 에이전트가 하네스와 더 새로운 방식으로 상호작용할 수 있도록 합니다. 도구와 서브에이전트가 코드에서 호출 가능하므로 에이전트는 한 서브에이전트의 출력을 가져와 검사하고, 변환하고, 모든 중간 아티팩트를 주 모델을 통해 다시 라우팅하지 않고 다음 단계로 공급할 수 있습니다.

이는 재귀적 워크플로우를 가능하게 합니다: 에이전트는 질문의 큐를 유지하고, 다음 질문에 대해 서브에이전트를 호출하고, 결과를 저장하고, 해당 결과에서 후속 작업을 생성하고, 답변을 종합할 충분한 증거가 있을 때까지 계속할 수 있습니다. (이는 단순히 "전체 입력 컨텍스트에서 다른 LLM을 호출"하는 것보다 훨씬 더 많습니다: 핵심은 모델 컨텍스트 외부에서 작업 상태를 유지하고 각 다음 호출로 전달되는 것을 제어하는 것입니다.)

const frontier = ["What changed in interpreter middleware?"];
const findings = [];

while (frontier.length && findings.length < 6) {
  const question = frontier.shift();

  const report = await tools.task({
    description: 
	    `Answer this question. If there is a useful next question, ` +
	    `include it as "Follow-up: ..."\n\n${question}`,
    subagent_type: "general-purpose",
  });

  findings.push(report);

  const next = report.match(/Follow-up: (.*)/)?.[1];
  if (next) frontier.push(next);
}

findings.join("\n\n");

이는 재귀적 언어 모델(RLM)의 개념과 인접해 있습니다: 모델 컨텍스트 외부에서 작업 상태를 유지하고, 선택된 분기에서 모델 또는 서브에이전트를 호출하고, 다음 모델 호출에 들어가는 것을 제어합니다.

Deep Agents에서 인터프리터는 해당 패턴의 작업 런타임이 되며 — 모델 계층에서 원래 정의된 대로 'RLM을 수행한다'고 주장하지 않습니다.

이 모든 것은 pypi에 deepagents[quickjs]를 설치하거나 npm에 @langchain/quickjs를 설치하고 이를 미들웨어로 추가하여 활성화할 수 있습니다.

from deepagents import create_deep_agent
from langchain_quickjs import REPLMiddleware

agent = create_deep_agent(
		model="baseten:zai-org/GLM-5",
    middleware=[REPLMiddleware()],
)

인터프리터에 대한 자세한 내용은 문서를 참조하세요.

하네스 프로필

Kimi K2.6, GLM 5.1, DeepSeek V4와 같은 오픈 웨이트 모델은 이제 프로덕션 에이전트 작업에 실행 가능하며, 종종 폐쇄형 최첨단 모델보다 20배 이상 저렴합니다. 하지만 모델은 다양한 도구 호출 형식 및 프롬프트 규칙에 대해 사후 학습되는 반면, 대부분의 하네스는 저자가 구축한 폐쇄형 모델에 맞게 튜닝됩니다. 하나를 차갑게 넣으면 모델이 하네스가 이해하지 못하는 방언으로 말하고 있기 때문에 실제 기능의 일부만 볼 수 있습니다.

그 격차는 크고 측정 가능합니다. 우리 자신의 테스트에서 하네스 계층 변경만으로 gpt-5.2-codex를 Terminal-Bench 2.0에서 52.8% → 66.5%(상위 30 → 상위 5)로 이동했고, gpt-5.3-codex를 tau2-bench에서 20% 올렸으며, opus-4.7을 10% 올렸습니다. tau2-bench 전반에 걸쳐 프롬프트와 미들웨어는 모델을 변경하지 않고도 점수를 10~20점 이동할 수 있습니다.

"하네스"는 모델 주위에 있습니다: 기본 시스템 프롬프트, 도구 및 해당 설명, 그리고 각 턴을 형성하는 미들웨어입니다. 하네스 프로필은 이러한 모델별 재정의를 명명된 버전 관리 가능한 단위로 캡처합니다.

Deep Agents v0.6은 하네스 프로필을 일급 추상화로 만듭니다. 모델과 함께 프로필을 비교하고, 버전 관리하고, 교체할 수 있으므로 튜닝 작업이 계속됩니다. 강력한 성능이 기본값이 되도록 주요 모델에 대한 기본 제공 프로필을 제공하고 있으며, 동일한 메커니즘을 자신의 스택에 사용할 수 있습니다.

다양한 모델 전반에 걸쳐 Deep Agents를 튜닝하는 방법에 대해 자세히 알아보세요. 자신의 프로필을 작성하려면 문서를 참조하세요.

스트리밍

에이전트는 최종 답변을 반환하기 전에 많은 작업을 수행합니다. 좋은 사용자 경험을 위해 해당 작업을 발생하는 즉시 표면화하고 사용자에게 에이전트를 조종할 수 있는 능력을 제공하려고 합니다: 스트리밍은 이를 가능하게 하는 원시입니다. LangChain의 새로운 릴리스는 스트리밍을 일급 애플리케이션 원시로 만듭니다. stream_events(..., version="v3")으로, 에이전트와 그래프는 이제 개발자가 실제로 렌더링하고 싶어하는 원시에 대한 인체공학적 투영이 있는 통합 이벤트 스트림을 내보냅니다: 메시지 텍스트, 추론 블록, 도구 호출, 상태 업데이트, 서브그래프, 서브에이전트, 사용자 정의 채널 및 최종 출력. 스트림은 콘텐츠 블록 중심이므로 UI가 더 이상 청크가 텍스트, 추론, 미디어 또는 도구 호출 데이터인지 추측할 필요가 없습니다. 모든 것이 타입이 지정된 이벤트, 네임스페이스 및 채널 주위에 구성되며, 모두 새로운 에이전트 스트리밍 프로토콜과 정렬됩니다.

stream = agent.stream_events(
    {"messages": [{"role": "user", "content": "Research LangChain streaming"}]},
    version="v3",
)

for message in stream.messages:
    for delta in message.text:
        print(delta, end="", flush=True)

for subagent in stream.subagents:
    print(f"\\n[{subagent.name}] {subagent.status}")

    for message in subagent.messages:
        print(f"[{subagent.name}] ", end="")
        for delta in message.text:
            print(delta, end="", flush=True)
        print()

이 스트리밍 모델은 새로운 에이전트 서버 엔드포인트 및 SDK 지원을 통해 네트워크를 통해 전달됩니다. LangGraph SDK는 client.threads.stream(...)을 통해 원격 이벤트 스트리밍을 노출하며, 멀티모달 콘텐츠, 다시 연결/재생 동작, SSE 또는 WebSocket을 통한 전송 불가지론적 전달을 지원합니다. 이제 로컬 및 원격 스트림이 동일한 프로토콜을 따르므로 개발자는 스크립트, 백엔드 서비스 및 프로덕션 프런트엔드 전반에 걸쳐 에이전트 실행을 관찰하는 일관된 방법을 얻습니다. 애플리케이션은 특정 서브에이전트의 메시지, 사용자 정의 채널의 업데이트 또는 특정 네임스페이스 내의 이벤트와 같이 필요한 실행의 정확히 부분만 구독할 수 있습니다.

프런트엔드에서 이 릴리스는 @langchain/react, @langchain/vue, @langchain/svelte 및 @langchain/angular에 대한 v1 프레임워크 통합을 제공하여 팀이 이벤트 파서를 직접 작성하지 않고도 풍부한 스트리밍 경험을 구축할 수 있는 관용적 후크와 유틸리티를 제공합니다. 새로운 스택을 쉽게 탐색할 수 있도록 메시지 스트리밍, 서브그래프, 서브에이전트, 사용자 정의 스트림 변환기, 멀티모달 UI, 다시 연결 동작 및 프레임워크 특정 패턴을 다루는 실행 가능한 예제의 모음인 스트리밍 쿡북도 발행하고 있습니다. 그 결과는 정밀도가 필요한 경우 낮은 수준이고 생산성을 원하는 경우 높은 수준이며 에이전트 런타임에서 사용자 인터페이스까지 일관된 스트리밍 기반입니다.

델타 채널

Deep Agents는 모든 단계에서 에이전트 진행 상황을 체크포인트하는 LangGraph 런타임을 기반으로 합니다. 이것이 관찰성, 인간 개입 및 실패 복구를 가능하게 하는 것입니다: 에이전트가 정확히 어디에 있는지 항상 알고 어느 지점에서든 재개할 수 있습니다.

에이전트가 더 능력 있어질수록:

더 오래 실행되며, 메시지 이력이 수십 개 또는 수백 개의 단계에 걸쳐 증가합니다.
더 많은 컨텍스트를 사용하며, 컨텍스트 관리 및 오프로딩을 위해 파일 시스템을 활용합니다.

deepagents의 경우, 메시지 이력 및 파일은 에이전트 상태에 있고, 매 단계 스냅샷 접근 방식으로 체크포인트 저장소는 O(N²)로 증가합니다.

델타 채널은 우리가 런타임을 발전시켜 따라잡는 방법입니다. 모든 체크포인트에서 전체 스냅샷을 직렬화하는 대신 diff만 저장합니다. Deep Agents의 경우, 이는 메시지 이력 및 파일에 대한 델타 기반 저장소를 의미합니다.

‍

여전히 에이전트 진행 상황의 완전한 이력을 얻지만 저장소 비용의 일부만 지불합니다. 이것은 또한 오래 실행되는 에이전트에 대한 체크포인터(데이터베이스)에 대한 쓰기 병목을 완화하는 데 도움이 되며, 규모별 저장소 비용은 훨씬 더 관리하기 쉬워집니다.

대화 길이 및 컨텍스트 크기에 따라 델타 채널로 교환하면 합리적으로 체크포인터 저장소에서 10~100배 감소를 가져올 수 있습니다.

예를 들어 실험을 고려해 봅시다: 에이전트가 파일을 작성하고, 문서를 검색하고, 자신의 작업을 통해 추론하는 시뮬레이션된 다중 파일 코딩 세션 — 능력 있는 코딩 에이전트가 실제로 수행하는 지속적인 컨텍스트 많은 작업의 200 턴. 델타 채널 없이 해당 세션은 5.27GB의 체크포인트 저장소를 누적합니다. 델타 채널 사용: 129MB.

다음은 델타 채널이 있는 경우와 없는 경우 동일한 에이전트에 대한 체크포인터 저장소의 비교입니다:

그리고 해당 폭발의 그래픽 표현:

깊은 컨텍스트를 가진 오래 실행되는 에이전트는 이 분야가 가는 곳이며, 델타 채널은 우리의 런타임이 그들의 요구를 충족하도록 확장하는 방법입니다.

자세한 내용은 전체 글을 참조하세요.

ContextHub 백엔드

Context Hub는 Deep Agents를 위한 LangSmith 지원 파일 시스템입니다. 에이전트 동작을 형성하는 파일에 대한 버전 관리되는 위치를 제공하므로 프롬프트, 기술 및 기타 컨텍스트에 대한 개선 사항이 실행 전반에 걸쳐 계속될 수 있습니다.

내부적으로 에이전트는 Hub 저장소에서 읽고(그리고 쓸 수 있습니다) 이 쓰기는 이력, 검토 및 환경 태깅이 있는 커밋으로 표시되므로 별도의 저장소 계층을 연결하지 않고 스테이징에서 반복하고 프로덕션으로 승격할 수 있습니다.

이를 에이전트의 파일 시스템 백엔드로 사용하려면:

from deepagents import create_deep_agent
from deepagents.backends import ContextHubBackend

agent = create_deep_agent(
    model="google_genai:gemini-3.1-pro-preview",
    backend=ContextHubBackend("my-agent"),
)

또는 /memories/만 Hub로 범위를 지정하고 파일 시스템의 나머지를 스레드 범위로 유지합니다:

from deepagents.backends import CompositeBackend, StateBackend, ContextHubBackend

agent = create_deep_agent(
    model="google_genai:gemini-3.1-pro-preview",
    backend=CompositeBackend(
        default=StateBackend(),
        routes={
            "/memories/": ContextHubBackend("my-agent"),
        },
    ),
)

읽기는 캐시에서 처리되고 쓰기는 Hub 저장소로 커밋됩니다. 저장소가 아직 없으면 첫 번째 쓰기가 생성되고 그 후에는 다른 버전 관리되는 컨텍스트 조각처럼 변경 사항을 비교하고, 검토하고, 태그할 수 있습니다.

ContextHubBackend를 사용하기 전에 LANGSMITH_API_KEY를 설정하세요. 충돌 처리 및 제한에 대한 전체 문서를 참조하세요.

마치면서

우리의 Deep Agents 5월 릴리스 전체를 통한 주제는 성능입니다:

하네스 프로필은 최적 하네스로 모델에서 성능을 짜내는 데 도움이 되며, 최첨단 API 비용의 일부로 오픈 웨이트 모델에서 실행 가능한 에이전트 실행을 잠금 해제합니다.
코드 인터프리터는 에이전트에게 코드를 작성하고 실행할 수 있는 더 많은 자율성을 제공하여 복잡한 작업을 수행하고 컨텍스트 윈도우 사용을 최적화하는 데 도움이 됩니다.
스트리밍은 도구 및 서브에이전트 진행 상황에 대한 구독 모델을 사용하여 고도로 병렬화된 시스템에 대한 지원을 활성화합니다.
DeltaChannel은 오래 실행되는 오래 실행되는 컨텍스트 에이전트에 대한 체크포인트를 지원하는 저장소 원시를 도입합니다.
ContextHubBackend: 에이전트 동작을 지원하는 파일의 버전 관리되는 홈으로, LangSmith Context Hub로 지원되며, 한 번의 실행에서 다음 실행으로의 컨텍스트 개선을 가능하게 합니다.

최신 deepagents를 사용해 보시기를 바랍니다. 의견을 알려주세요!

릴리스 노트:

‍Python

‍TypeScript

‍

The latest Deep Agents release is centered around performance at the model layer, the agent layer, at scale, and over time. Five things in this release contribute:

Code interpreter: a lightweight runtime for agents to compose tools, manage state, and control what reaches model context — without the overhead of a full sandbox.
Harness profiles: per-model tuning so your harness gets the most out of whichever model you're running, including open-weight models like Kimi, Qwen, and DeepSeek.
Streaming: typed projections for messages, tool calls, subagents, and custom application events — subscribe to exactly what your application needs instead of parsing raw stream output.
DeltaChannel: efficient checkpoint storage as agents run longer and context accumulates, without sacrificing the durable execution guarantees that make agents resumable, observable, and resilient.
ContextHubBackend: backed by LangSmith Context Hub, store the skills, policies, and memories that shape agent behavior in a versioned, collaborative home, so what your agent learns from one run can improve the next.

Code interpreter

We’re releasing an installable code interpreter in Deep Agents, which give agents a programmable workspace where they can transform data, coordinate tool calls, and keep intermediate work out of the model context. The agent writes code to express its intent, then an in-memory runtime executes that code and returns the relevant results.

Where sandboxes are a code-first way for acting on an environment (such as running commands, installing dependencies, and editing files), interpreters are a code-first way for acting inside the agent loop: composing tools, preserving state, and deciding what information should be returned to the model.

// Agent can write code like this:
const topics = ["retrieval", "memory", "evaluation"];

const reports = await Promise.all(
  topics.map((topic) =>
    tools.task({
      description:
	      `Research ${topic} in Deep Agents and return three concise findings.`,
      subagent_type: "general-purpose",
    }),
  ),
);

reports.join("\n\n");

This enables a few new novel capabilities for agents that we’re particularly excited about:

Model-agnostic PTC

Standard tool calling loops make the model the traffic controller for every step. The model asks for a tool, receives the full result in context, reasons over that result, and repeats. Even when an intermediate result is only needed to compute the next input, it still has to chain through multiple model calls.

Programmatic Tool Calling (PTC) changes that workflow. The model writes code that calls tools from inside an execution runtime, so workflows can run without a round-trip to a model for every individual tool invocation. Intermediate results can stay in runtime state where the interpreter can filter noisy outputs, process data, retry failures, and return only the relevant context back to the model.

const pages = await Promise.all(
  urls.map((url) => tools.fetchUrl({ url })),
);

const relevant = pages
  .filter((page) => page.includes("interpreter"))
  .slice(0, 3);

relevant.map((page) => page.slice(0, 500));

This pattern of doing tool calling reduces token consumption, cuts down on avoidable model round trips, and makes the agent’s reasoning step smaller.

Anthropic helped popularize this pattern by adding it as an API behavior for their model family, but with an interpreter this can now be achieved by any agent with any model (including open source models).

Recursive workflows

Interpreters let agents interact with the harness in more novel ways. Because tools and subagents are callable from code, an agent can take the output of one subagent, inspect it, transform it, and feed it into another step without routing every intermediate artifact back through the main model.

That makes recursive workflows possible: the agent can keep a queue of questions, call a subagent on the next question, store the result, generate follow-up work from that result, and continue until it has enough evidence to synthesize an answer. (This is more than just “call another LLM on the full input context”: the key is maintaining working state outside the model context and controlling what gets passed into each next call.)

const frontier = ["What changed in interpreter middleware?"];
const findings = [];

while (frontier.length && findings.length < 6) {
  const question = frontier.shift();

  const report = await tools.task({
    description: 
	    `Answer this question. If there is a useful next question, ` +
	    `include it as "Follow-up: ..."\n\n${question}`,
    subagent_type: "general-purpose",
  });

  findings.push(report);

  const next = report.match(/Follow-up: (.*)/)?.[1];
  if (next) frontier.push(next);
}

findings.join("\n\n");

This is adjacent to the idea behind Recursive Language Models (RLM): keep working state outside the model context, call models or subagents on selected branches, and control what enters the next model call.

In Deep Agents, the interpreter becomes the working runtime for that pattern — without claiming we “do RLM” as originally defined at the model layer.

All of this can be enabled by installing deepagents[quickjs] on pypi, or @langchain/quickjs on npm and adding it as a middleware

from deepagents import create_deep_agent
from langchain_quickjs import REPLMiddleware

agent = create_deep_agent(
		model="baseten:zai-org/GLM-5",
    middleware=[REPLMiddleware()],
)

See the docs for more information on interpreters.

Harness profiles

Open-weight models like Kimi K2.6, GLM 5.1, and DeepSeek V4 are now viable for production agent work, often at 20×+ lower cost than closed frontier models. But models are post-trained on different tool-calling format and prompt conventions, while most harnesses are tuned for the closed model their authors built against. Drop one in cold, and you might see only a fraction of its true capability because the model is speaking a dialect the harness doesn’t understand.

That gap is large and measurable. In our own testing, harness-layer changes alone moved gpt-5.2-codex from 52.8% → 66.5% on Terminal-Bench 2.0 (Top 30 → Top 5), lifted gpt-5.3-codex 20% on tau2-bench, and opus-4.7 10%. Across tau2-bench, prompts and middleware can move scores by 10 to 20 points without changing the model.

The "harness" is around the model: the base system prompt, tools and their descriptions, and middleware that shapes each turn. A harness profile captures these per-model overrides as a named, versionable unit.

Deep Agents v0.6 makes harness profiles a first-class abstraction. You can diff, version, and swap a profile alongside the model, so tuning work carries forward. We're shipping built-in profiles for major models so strong performance is the default, and the same machinery is available for your own stack.

More in tuning deep agents across different models. See the docs to write your own.

Streaming

Agents do a lot of work before they return a final answer. For a good user experience, you want to surface that work as it happens, and give users the ability to steer the agent along the way: streaming is the primitive that makes this possible. LangChain’s new release makes streaming a first-class application primitive. With stream_events(..., version="v3"), agents and graphs now emit a unified event stream with ergonomic projections for primitives developers actually want to render: message text, reasoning blocks, tool calls, state updates, subgraphs, subagents, custom channels, and final output. The stream is content-block-centric, which means UIs no longer need to guess whether a chunk is text, reasoning, media, or tool-call data. Everything is organized around typed events, namespaces, and channels, all aligned with the new Agent Streaming Protocol.

stream = agent.stream_events(
    {"messages": [{"role": "user", "content": "Research LangChain streaming"}]},
    version="v3",
)

for message in stream.messages:
    for delta in message.text:
        print(delta, end="", flush=True)

for subagent in stream.subagents:
    print(f"\\n[{subagent.name}] {subagent.status}")

    for message in subagent.messages:
        print(f"[{subagent.name}] ", end="")
        for delta in message.text:
            print(delta, end="", flush=True)
        print()

This streaming model also carries over the wire through new Agent Server endpoints and SDK support. The LangGraph SDK exposes remote event streaming through client.threads.stream(...), with support for multimodal content, reconnect/replay behavior, and transport-agnostic delivery over SSE or WebSockets. Because local and remote streams now follow the same protocol, developers get a consistent way to observe agent runs across scripts, backend services, and production frontends. Applications can subscribe to exactly the parts of a run they need, such as messages from a specific subagent, updates from a custom channel, or events within a particular namespace.

On the frontend, this release brings v1 framework integrations for @langchain/react, @langchain/vue, @langchain/svelte, and @langchain/angular, giving teams idiomatic hooks and utilities for building rich streamed experiences without hand-rolling event parsers. To make the new stack easy to explore, we’re also publishing the Streaming Cookbook: a collection of runnable examples covering message streaming, subgraphs, subagents, custom stream transformers, multimodal UI, reconnect behavior, and framework-specific patterns. The result is a streaming foundation that is lower-level where you need precision, higher-level where you want productivity, and consistent from agent runtime to user interface.

Delta channels

Deep Agents is built on the LangGraph runtime, which checkpoints agent progress at every step. That's what makes observability, human-in-the-loop, and failure recovery possible: you always know exactly where an agent is and can resume from any point.

As agents get more capable:

They run longer, with message histories that grow across dozens or hundreds of steps
They use more context, utilizing the filesystem for context management and offloading

For deepagents, message history and files live in agent state, and with a snapshot-every-step approach, checkpoint storage grows at O(N²).

Delta channels are how we're evolving the runtime to keep up. Rather than serializing a full snapshot at every checkpoint, we store only the diff. For Deep Agents, that means delta-based storage for message histories and files.

‍

You still get a complete history of agent progress, just at a fraction of the storage cost. This also helps to mitigate the bottleneck of writes to the checkpointer (database) for long-running agents, and storage costs at scale are much more manageable.

Depending on the conversation length and context size, swapping to delta channels can reasonably bring 10-100x reductions in checkpointer storage.

Consider, for example, an experiment: a simulated multi-file coding session where an agent writes files, retrieves documentation, and reasons through its work — 200 turns of the kind of sustained, context-heavy work a capable coding agent actually does. Without delta channels, that session accumulates 5.27 GB of checkpoint storage. With delta channels: 129 MB.

Here’s a comparison of checkpointer storage for the same agent with and without delta channels:

And a graphical representation of said explosion:

Long-running agents with deep context are where the field is heading, and delta channels are how our runtime scales to meet their needs.

See the full writeup for more details.

ContextHub Backend

Context Hub is a LangSmith-backed filesystem for Deep Agents. It gives you a versioned place for the files that shape agent behavior, so improvements to prompts, skills, and other context can carry forward across runs.

Under the hood, your agent reads from (and can write to) a Hub repo. Those writes land as commits with history, review, and environment tagging—so you can iterate in staging and promote to production without wiring up a separate storage layer.

To use it as your agent's filesystem backend:

from deepagents import create_deep_agent
from deepagents.backends import ContextHubBackend

agent = create_deep_agent(
    model="google_genai:gemini-3.1-pro-preview",
    backend=ContextHubBackend("my-agent"),
)

Or scope just /memories/ to Hub while keeping the rest of the filesystem thread-scoped:

from deepagents.backends import CompositeBackend, StateBackend, ContextHubBackend

agent = create_deep_agent(
    model="google_genai:gemini-3.1-pro-preview",
    backend=CompositeBackend(
        default=StateBackend(),
        routes={
            "/memories/": ContextHubBackend("my-agent"),
        },
    ),
)

Reads are served from cache, and writes are committed back to the Hub repo. If the repo doesn’t exist yet, the first write creates it—after that, you can diff, review, and tag changes like any other piece of versioned context.

Set LANGSMITH_API_KEY before using ContextHubBackend. See the full docs for conflict handling and limits.

Wrapping up

The through-line across our Deep Agents May release is performance:

Harness profiles help you squeeze performance out of a model with an optimal harness and unlock viable agent runs on open-weight models at a fraction of the cost of frontier APIs
Code interpreter gives an agent more autonomy to write an execute code, helping it accomplish complex tasks and optimize context window usage.
Streaming enables support for highly parallelized systems with a subscription model for tool and subagent progress.
DeltaChannel introduces a storage primitive that supports checkpoints for long-running, long-context agents.
ContextHubBackend: a versioned home for the files that power agent behavior, backed by LangSmith Context Hub, enables context improvements from one run to the next.

We’re excited for you to give the latest deepagents a spin. Let us know what you think!

Release notes:

‍Python

‍TypeScript

‍

#code-interpreter #streaming #delta-channels #context-hub #agent-optimization #scalability