LangChain Blog · 12일 전 · 원문 보기

토큰 스트림에서 에이전트 스트림으로

From Token Streams to Agent Streams

지금 사람들이 만들고 있는 에이전트들은 많은 일을 합니다. 단일 Deep Agents 실행은 계획을 세우고, 서브에이전트에 위임하고, 도구를 호출하고, 인간의 승인을 기다리고, 과정 중에 텍스트, 구조화된 데이터 또는 미디어를 생성할 수 있습니다. 이러한 각 단계는 사용자가 발생하는 대로 보고 싶어 하는 것입니다.

하나의 모델 호출과 하나의 토큰 스트림을 위해 설계된 스트리밍 API는 이를 지원할 수 없습니다. 에이전트가 그래프 전체로 확산되면, 프론트엔드는 토큰 델타 이상이 필요합니다. 어떤 단계가 각 이벤트를 생성했는지, 화면의 서브에이전트만 구독하는 방법, 그리고 모든 것을 다시 재생하지 않고 브라우저 새로고침 후 다시 연결하는 방법을 알아야 합니다.

최신 Deep Agents, LangChain, 및 LangGraph 스트리밍 작업은 원본 청크 대신 애플리케이션 이벤트 주위에 설계되었습니다. 각 이벤트는 타입이 지정되고 에이전트 트리의 어디서 나왔는지에 따라 태그됩니다. 애플리케이션은 메시지, 도구 호출 또는 서브에이전트 상태와 같은 프로젝션을 반복합니다. 동일한 모델이 로컬 실행에서 원격 스레드로, React, Vue, Svelte 및 Angular SDK로 전달됩니다. 릴리스와 함께 실행 가능한 Python 및 TypeScript 예제가 포함된 스트리밍 쿡북을 출판하고 있습니다.

요구사항

세 개의 서브에이전트에 위임하는 연구 에이전트를 생각해봅시다. 각 서브에이전트는 도구를 호출하고, 상태를 업데이트하고, 중간 발견 사항을 스트리밍합니다. 유용한 제품 UI는 메인 답변을 토큰별로 렌더링하고, 각 서브에이전트의 상태, 수집되는 도구 호출, 그리고 에이전트가 생성하는 모든 미디어를 렌더링하고 싶을 수 있습니다.

모든 것을 하나의 스트림으로 평탄화하면 애플리케이션 개발자에게 너무 많은 작업을 요구합니다. 스트리밍 레이어가 그 복잡성을 흡수하면, "토큰을 표시할 수 있나요?"를 넘어 물어볼 가치가 있는 질문으로 이동합니다:

에이전트 작업의 라이브 트리를 렌더링할 수 있나요?
다른 모든 서브에이전트의 출력을 다운로드하지 않고 하나의 서브에이전트만 구독할 수 있나요?
연결된 청크 대신 명시적인 구조로 추론, 도구 호출, 상태 및 미디어를 스트리밍할 수 있나요?
인간 승인 요청을 1급 이벤트로 표시할 수 있나요?
실행 중인 스레드에 다시 연결하고 중단한 부분부터 계속할 수 있나요?
런타임을 분기하지 않고 사용자 정의 도메인 특정 스트림을 추가할 수 있나요?
로컬, 원격 및 프론트엔드 프레임워크에서 동일한 개념을 사용할 수 있나요?

타입이 지정된 스트림은 에이전트 실행을 구조화된 애플리케이션 이벤트로 변환합니다: 텍스트, 도구, 미디어, 추론, 코드 및 서브에이전트를 독립적으로 스트리밍할 수 있게 합니다.

채팅 완료 및 단일 모델 호출을 위한 스트리밍은 해결된 문제입니다. 다음 레이어는 그래프 모양, 도구 사용, 상태 보존, 중단 가능, 멀티모달 에이전트를 위한 스트리밍으로 백엔드 및 프론트엔드 전체에서 실행됩니다.

해결책

새로운 스트리밍 프리미티브는 네 가지 아이디어 주위에 구축됩니다:

원본 청크 대신 타입이 지정된 이벤트.
각 이벤트는 설명하는 작업 종류(메시지, 도구 호출, 상태 변경, 서브에이전트 상태) 및 에이전트 트리의 어디에서 나왔는지를 나타내는 레이블과 함께 도착합니다.
파싱 대신 프로젝션.
애플리케이션은 렌더링하고 싶은 뷰를 반복합니다: 메시지, 도구 호출, 서브에이전트 상태, 사용자 정의 채널. 런타임이 어셈블리, 재정렬 및 재연결을 처리합니다.
범위가 지정된 구독.
클라이언트는 렌더링하는 채널과 에이전트 트리의 부분만 요청하므로, 서브에이전트 검사기가 모든 서브에이전트의 토큰을 끌어오지 않습니다.
런타임 전체에서 동일한 모델.
로컬 그래프 실행, 원격 스레드, 및 React/Vue/Svelte/Angular 컴포넌트는 모두 동일한 프로토콜을 사용하며, 하단의 프로젝션과 상단의 훅을 사용합니다.

타입이 지정된 이벤트 프로토콜

새로운 스트리밍 기초는 공통 이벤트 엔벨로프로 시작합니다. 불투명한 스트림 튜플 대신, 설명하는 작업 종류 및 에이전트 트리의 어디에서 나왔는지를 나타내는 레이블이 있는 구조화된 이벤트를 얻습니다.

채널은 스트리밍되는 관심사를 설명합니다:

messages - 성적서 및 콘텐츠 블록 델타의 경우
values 및 updates - 그래프 상태의 경우
tools - 도구 실행 라이프사이클의 경우
lifecycle - 실행, 서브그래프 및 서브에이전트의 경우
checkpoints - 분기 및 시간 여행의 경우
custom:* - 애플리케이션 정의 프로젝션의 경우

네임스페이스는 이벤트가 에이전트 트리의 어디에서 발생했는지를 설명합니다. 루트 그래프, 중첩된 서브그래프 및 Deep Agents 서브에이전트는 모두 동일한 채널 타입을 방출할 수 있지만 자신의 정체성을 잃지 않습니다.

그 분리가 핵심 설계 선택입니다: 채널은 재사용 가능한 관심사이지만, 네임스페이스는 실행을 생성하는 부분을 식별합니다.

프로젝션: 개발자가 실제로 원하는 API

대부분의 애플리케이션 코드는 원본 프로토콜 이벤트를 반복해서는 안 됩니다. 렌더링하고 싶은 것을 요청해야 합니다.

실행은 정확히 그렇게 하며, 이벤트 스트림 위에 타입이 지정된 프로젝션을 노출합니다:

run = await graph.astream_events(
    {"messages": [{"role": "user", "content": "Research LangChain streaming"}]},
    version="v3",
)

async for message in run.messages:
    async for delta in message.text:
        sys.stdout.write(delta)

final_state = await run.output()

각 메시지는 애플리케이션이 다시 연결해야 하는 문자열의 스트림 대신, 텍스트, 추론, 도구 호출 인수 및 사용 데이터와 같은 타입이 지정된 콘텐츠 블록으로 도착합니다.

최신 모델 출력의 경우 이것이 중요합니다. 추론은 최종 답변 텍스트와 다르게 렌더링되어야 합니다. 도구 호출 인수는 구조화된 데이터로 조립되어야 합니다. 사용 및 출력 메타데이터는 스트리밍 경로를 통해 유지되어야 합니다. 멀티모달 데이터는 텍스트 전용 인터페이스를 통해 강제되어야 합니다.

서브에이전트 및 서브그래프

동일한 프로젝션 패턴은 메시지를 넘어 적용됩니다. LangGraph는 개발자가 에이전트를 중첩된 서브그래프를 포함한 노드의 그래프로 구조화할 수 있게 하는 런타임 레이어입니다. Deep Agents는 그 위에 앉아 있으며 에이전트가 작업을 서브에이전트에게 넘길 수 있는 더 높은 수준의 위임 모델을 추가합니다. 스트리밍은 그들을 하나의 평탄 기록으로 붕괴시키지 않고 둘 다 보이게 해야 합니다.

새로운 프리미티브는 서브그래프와 서브에이전트를 구별합니다:

서브그래프 - 중첩된 그래프 실행을 위해 표시됩니다.
서브에이전트 - 에이전트가 Deep Agents task 호출을 통해 위임할 때 오늘 표시됩니다.

async for subagent in run.subagents:
    print(f"{subagent.name}: {subagent.status}")

    async for message in subagent.messages:
        async for delta in message.text:
            sys.stdout.write(delta)

둘 다 정체성, 위치 및 상태를 읽을 수 있는 경량 핸들로 도착합니다. 상세한 메시지, 도구 호출 및 상태 변경은 UI의 뭔가가 그들을 요청할 때만 스트리밍됩니다.

이것은 에이전트 복잡성에 따라 확장되는 UI를 가능하게 합니다. Deep Agents에서 구축한 대시보드는 실행 중인 서브에이전트 목록을 무료로 표시한 다음, 선택한 것에 대해서만 메시지 및 도구 스트림을 열 수 있습니다. 연구 제품은 모든 워커가 생성한 모든 토큰에 대한 와이어 비용을 지불하지 않고 작업 트리 전체에서 높은 수준의 진행 상황을 표시할 수 있습니다.

개발자의 경우, 이것이 실질적인 변화입니다: 스트리밍은 더 이상 각 앱이 파싱해야 하는 저수준 전송 세부 사항이 아닙니다. 이는 애플리케이션 API입니다.

사용자 정의 채널

모든 유용한 스트림이 내장 채널은 아닙니다. 애플리케이션은 종종 도메인 특정 프로젝션이 필요합니다: 인용, 진행 이벤트, 구조화된 계획, UI 설명, 미디어 핸들, 워크플로우 메트릭 또는 제품이 라이브로 렌더링하고 싶어 하는 다른 것.

스트리밍 변환기는 프로토콜 이벤트를 필터링하고 파생된 항목을 명명된 채널로 푸시하는 작은 클래스입니다. 여기의 ToolActivityTransformer는 도구 호출 시작을 위해 messages 채널을 감시하고 결과를 toolActivity 확장으로 노출합니다. 전체 패턴은 자신의 프로젝션 구축을 참조하세요.

run = await graph.astream_events(
    input,
    version="v3",
    transformers=[ToolActivityTransformer],
)

async for activity in run.extensions["toolActivity"]:
    print(activity)

프론트엔드에서 동일한 아이디어는 확장 선택기로 나타납니다:

const stream = useStream({
  assistantId: "agent",
  apiUrl: "<http://localhost:2024>",
});

const uiEvent = useExtension(stream, "a2ui");

쿡북은 에이전트가 custom:a2ui 채널을 통해 선언적 A2UI 메시지를 방출하는 생성 UI 예제를 포함합니다. React 앱이 해당 확장을 구독하고 에이전트가 생성할 때 라이브 인터페이스 표면을 렌더링합니다. 이것이 우리가 공통이 될 것으로 예상하는 패턴입니다: 에이전트가 어시스턴트 텍스트뿐만 아니라 제품별 상태를 스트리밍합니다.

하나의 이벤트 로그, 많은 뷰

에이전트 UI는 종종 동일한 실행의 여러 라이브 뷰가 필요합니다. 채팅 패널은 메인 답변을 렌더링합니다. 측면 패널은 서브에이전트 활동을 렌더링합니다. 디버거는 원본 이벤트를 렌더링합니다. 진행 표시줄은 상태를 감시합니다. 분석 레이어는 도구 사용을 기록합니다.

이러한 뷰는 동일한 스트림을 드레이닝하기 위해 경쟁해야 합니다.

새로운 런타임 모델은 동일한 기본 이벤트 로그에 대한 여러 프로젝션을 지원합니다. 메시지, 값, 도구 호출, 서브그래프 및 사용자 정의 확장을 독립적으로 사용할 수 있습니다. 진행 뷰를 추가해도 채팅 스트림을 다시 작성할 필요가 없습니다. 서브에이전트 검사기를 추가해도 모든 서브에이전트 토큰을 모든 컴포넌트로 배송하지 않아도 됩니다.

원격 스트리밍은 동일한 아이디어를 사용합니다. 클라이언트는 정확히 필요한 채널 및 네임스페이스를 구독할 수 있습니다:

const thread = client.threads.stream({
  assistantId: "research-agent",
});

await thread.subscribe({
  channels: ["messages", "tools", "values", "lifecycle"],
  namespaces: [["researcher"]],
  depth: 2,
});

이벤트는 클라이언트가 마지막으로 본 지점에서 재연결 및 재생할 수 있도록 정렬 메타데이터를 전달합니다. 에이전트가 여전히 실행 중인 동안 브라우저가 새로고침되면, UI는 스레드에 다시 연결하고, 버퍼된 이벤트를 따라잡고, 실행을 다시 시작하거나 콘텐츠를 복제하는 대신 라이브 업데이트를 계속할 수 있습니다.

이것은 프로덕션 에이전트 애플리케이션의 경우 특히 중요합니다. 장기 실행 에이전트는 더 이상 엣지 케이스가 아닙니다. 개발자가 구축하려는 워크로드입니다.

멀티모달 스트림

프로토콜은 일반 문자열보다는 콘텐츠 블록 주위에 설계되어 있으므로, 멀티모달 스트리밍은 동일한 모델의 자연스러운 확장입니다.

각 페이지는 필요한 미디어만 구독하고, 텍스트, 이미지 및 오디오를 사용 가능해지면서 렌더링합니다.

쿡북의 멀티모달 스토리북 데모에서, 그래프는 취침 이야기, 페이지 이미지, 오디오 내레이션 및 비디오를 생성합니다. UI는 각 페이지를 담당하는 그래프 노드로 미디어 선택기의 범위를 지정하므로, 각 페이지는 자산이 도착하는 즉시 렌더링할 수 있습니다.

const visualizer = useNodeRun(`visualizer_${index}`);
const images = useImages(stream, visualizer?.namespace);
const imageURL = useMediaURL(images[0]);

중요한 부분은 데모 자체가 아닙니다. 텍스트, 추론, 도구 활동, 이미지, 오디오, 비디오 및 사용자 정의 데이터가 모두 동일한 스트리밍 아키텍처에 맞습니다: 타입이 지정된 블록, 명명된 채널, 네임스페이스 및 프로젝션.

실제 애플리케이션을 위한 프레임워크 SDK

릴리스는 또한 스트리밍된 에이전트 애플리케이션을 구축하기 위한 v1 프레임워크 패키지를 제공합니다:

각 패키지는 프레임워크의 관용구에서 동일한 스트리밍 개념을 노출합니다. React는 훅을 사용하고, Vue는 컴포저블을 사용하고, Svelte는 반응형 헬퍼를 사용하고, Angular는 인젝터 및 신호를 사용합니다.

핵심 정신 모델은 공유됩니다:

하나의 루트 훅: useStream 또는 Angular의 injectStream
최상위 프로젝션: messages, values, 도구 호출, 중단은 설정 없이 사용 가능합니다.
컴포넌트 레벨 선택기: useMessages, useToolCalls, useExtension 및 친구들은 뭔가가 마운트될 때만 범위가 지정된 구독을 위해.
서브에이전트 및 서브그래프는 즉시 표시됩니다: 상세한 스트림은 그들에게 도달할 때만 열립니다.
동일한 스레드로 다시 마운트하면 재생이나 복제 없이 진행 중인 실행으로 다시 연결됩니다.

React에서, 기본적인 스트리밍된 채팅은 작게 유지될 수 있습니다:

const stream = useStream({
  assistantId: "agent",
  apiUrl: "<http://localhost:2024>",
  threadId,
});

const messages = useMessages(stream);

그리고 서브에이전트 인식 컴포넌트는 렌더링하는 데이터만 구독할 수 있습니다:

function SubagentCard({ stream, subagent }) {
  const messages = useMessages(stream, subagent);
  const toolCalls = useToolCalls(stream, subagent);

  return <AgentTrace messages={messages} toolCalls={toolCalls} />;
}

이것이 콜백 무거운 스트리밍과 렌더링 주도 스트리밍의 차이입니다. 컴포넌트는 필요한 프로젝션을 마운트합니다. SDK가 구독 수명, 재연결 및 어셈블리를 처리합니다.

다음에는 무엇이 있을까요

이 릴리스는 기초를 설정합니다: 타입이 지정된 이벤트, 콘텐츠 블록, 프로젝션, 범위가 지정된 구독, 재연결 의미론, 프레임워크 SDK 및 사용자 정의 채널.

우리는 이미 프로토콜을 더 확장하고 있습니다. 한 가지 근 기간 방향은 더 풍부한 파일 스트리밍입니다: 에이전트가 샌드박스 또는 기타 관리되는 실행 환경과 같은 백엔드에 연결된 임의의 파일을 스트리밍할 수 있도록 하는 모듈입니다. 에이전트가 더 많은 아티팩트를 생성 및 수정할 때, 이러한 아티팩트는 메시지, 도구, 상태 및 미디어와 동일한 스트리밍 모델을 통해 흘러야 합니다.

목표는 간단합니다: 애플리케이션은 화면에 있는 에이전트 작업만 구독하고, 올바른 추상화로 렌더링하고, 실행이 길어지고 출력이 더 풍부해질 때 연결 상태를 유지해야 합니다.

스트리밍 에이전트는 로그를 파싱하는 것이 아니라 애플리케이션을 구축하는 것처럼 느껴야 합니다.

The agents people are building now do a lot. A single Deep Agents run can plan, delegate to subagents, call tools, pause for human approval, and produce text, structured data, or media along the way. Every one of those steps is something a user might want to see as it happens.

Streaming APIs designed for one model call and one stream of tokens can't carry that. Once an agent fans out across a graph, the frontend needs more than token deltas. It needs to know which step produced each event, how to subscribe to just the subagent on screen, and how to reconnect after a browser refresh without replaying everything.

The latest Deep Agents, LangChain, and LangGraph streaming work is designed around application events instead of raw chunks. Each event is typed and tagged by where in the agent tree it came from; applications iterate projections like messages, tool calls, or subagent statuses; the same model carries from local runs to remote threads to React, Vue, Svelte, and Angular SDKs. Alongside the release, we're publishing a streaming cookbook with runnable Python and TypeScript examples.

The Requirements

Consider a research agent that delegates to three subagents, each calling tools, updating state, and streaming intermediate findings. A useful product UI might want to render the main answer token-by-token, each subagent's status, tool calls as they assemble, and any media the agent generates.

Flattening all of that into one stream pushes too much work onto application developers. If the streaming layer absorbs that complexity, the questions worth asking move past "can I show tokens?" to:

Can I render a live tree of agent work?
Can I subscribe to one subagent without downloading every other subagent's output?
Can I stream reasoning, tool calls, state, and media with explicit structure rather than concatenated chunks?
Can I surface human approval requests as first-class events?
Can I reconnect to a running thread and pick up where it left off?
Can I add custom domain-specific streams without forking the runtime?
Can I use the same concepts locally, remotely, and in a frontend framework?

Typed streams turn agent execution into structured application events: enabling text, tools, media, reasoning, code, and subagents to stream independently.

Streaming for chat completions and single model calls is a solved problem. The next layer is streaming for graph-shaped, tool-using, stateful, interruptible, multimodal agents that run across backends and frontends.

The Solution

The new streaming primitives are built around four ideas:

Typed events instead of raw chunks.
Each event arrives labelled with what kind of work it describes (a message, a tool call, a state change, a subagent status) and where in the agent tree it came from.
Projections instead of parsing.
Applications iterate the views they want to render: messages, tool calls, subagent statuses, custom channels. The runtime handles assembly, reordering, and reconnection.
Scoped subscriptions.
Clients ask only for the channels and parts of the agent tree they're rendering, so a subagent inspector doesn't pull every subagent's tokens.
The same model across runtimes.
Local graph runs, remote threads, and React/Vue/Svelte/Angular components all speak the same protocol, with projections at the bottom and hooks at the top.

A Typed Event Protocol

The new streaming foundation starts with a common event envelope. Instead of opaque stream tuples, you get structured events labelled with what kind of work they describe and where in the agent tree they came from.

Channels describe the concern being streamed:

messages for transcript and content-block deltas
values and updates for graph state
tools for tool execution lifecycle
lifecycle for runs, subgraphs, and subagents
checkpoints for branching and time travel
custom:* for application-defined projections

Namespaces describe where the event happened in the agent tree. The root graph, a nested subgraph, and a Deep Agents subagent can all emit the same channel type without losing their identity.

That separation is the key design choice: channels are reusable concerns, while namespaces identify the part of the run producing them.

Projections: The API Developers Actually Want

Most application code should not iterate over raw protocol events. It should ask for the thing it wants to render.

Runs do exactly that, exposing typed projections on top of the event stream:

run = await graph.astream_events(
    {"messages": [{"role": "user", "content": "Research LangChain streaming"}]},
    version="v3",
)

async for message in run.messages:
    async for delta in message.text:
        sys.stdout.write(delta)

final_state = await run.output()

Each message arrives as typed content blocks such as text, reasoning, tool-call arguments, and usage data, instead of a stream of strings that applications need to stitch back together.

That matters for modern model output. Reasoning should be rendered differently from final answer text. Tool-call arguments need to be assembled as structured data. Usage and output metadata should survive the streaming path. Multimodal data should not be forced through a text-only interface.

Subagents and Subgraphs

The same projection pattern applies beyond messages. LangGraph is the runtime layer that lets developers structure agents as graphs of nodes, including nested subgraphs. Deep Agents sits on top of it and adds a higher-level delegation model where an agent can hand work off to a subagent. Streaming needs to make both visible without collapsing them into one flat transcript.

The new primitives distinguish subgraphs from subagents:

Subgraphs surface for any nested graph execution.
Subagents surface today when an agent delegates via the Deep Agents task call.

async for subagent in run.subagents:
    print(f"{subagent.name}: {subagent.status}")

    async for message in subagent.messages:
        async for delta in message.text:
            sys.stdout.write(delta)

Both arrive as lightweight handles you can read identity, position, and status off of. The detailed messages, tool calls, and state changes only stream when something in your UI asks for them.

This enables UIs that scale with agent complexity. A dashboard built on Deep Agents can show a list of running subagents for free, then open message and tool streams only for the selected one. A research product can show high-level progress across a tree of work without paying the wire cost for every token produced by every worker.

For developers, this is the practical shift: streaming is no longer a low-level transport detail that each app has to parse. It is an application API.

Custom Channels

Not every useful stream is a built-in channel. Applications often need domain-specific projections: citations, progress events, structured plans, UI descriptions, media handles, workflow metrics, or anything else the product wants to render live.

Streaming transformers are small classes that filter protocol events and push derived items into a named channel. The ToolActivityTransformer here watches the messages channel for tool-call starts and exposes the result as a toolActivity extension. See Build your own projection for the full pattern.

run = await graph.astream_events(
    input,
    version="v3",
    transformers=[ToolActivityTransformer],
)

async for activity in run.extensions["toolActivity"]:
    print(activity)

On the frontend, the same idea appears as extension selectors:

const stream = useStream({
  assistantId: "agent",
  apiUrl: "<http://localhost:2024>",
});

const uiEvent = useExtension(stream, "a2ui");

The cookbook includes a generative UI example where an agent emits declarative A2UI messages over a `custom:a2ui` channel. The React app subscribes to that extension and renders live interface surfaces as the agent produces them. This is the pattern we expect to become common: agents streaming product-specific state, not just assistant text.

One Event Log, Many Views

Agent UIs often need multiple live views of the same run. A chat panel renders the main answer. A side panel renders subagent activity. A debugger renders raw events. A progress bar watches state. An analytics layer records tool usage.

Those views should not compete to drain the same stream.

The new runtime model supports multiple projections over the same underlying event log. You can consume messages, values, tool calls, subgraphs, and custom extensions independently. Adding a progress view does not require rewriting the chat stream. Adding a subagent inspector does not mean shipping every subagent token to every component.

Remote streaming uses the same idea. A client can subscribe to exactly the channels and namespaces it needs:

const thread = client.threads.stream({
  assistantId: "research-agent",
});

await thread.subscribe({
  channels: ["messages", "tools", "values", "lifecycle"],
  namespaces: [["researcher"]],
  depth: 2,
});

Events carry ordering metadata so clients can reconnect and replay from the last seen point. If a browser refreshes while an agent is still running, the UI can reattach to the thread, catch up on buffered events, and continue with live updates instead of restarting the run or duplicating content.

This is especially important for production agent applications. Long-running agents are not edge cases anymore; they are the workloads developers are building toward.

Multimodal Streams

The protocol is designed around content blocks rather than plain strings, which makes multimodal streaming a natural extension of the same model.

Each page subscribes only to the media it needs, rendering text, images, and audio as they become available.

In the cookbook's multimodal storybook demo, a graph generates a bedtime story, page images, audio narration, and video. The UI scopes media selectors to the graph nodes responsible for each page, so each page can render as soon as its assets arrive.

const visualizer = useNodeRun(`visualizer_${index}`);
const images = useImages(stream, visualizer?.namespace);
const imageURL = useMediaURL(images[0]);

The important part is not the demo itself. It is that text, reasoning, tool activity, images, audio, video, and custom data all fit into the same streaming architecture: typed blocks, named channels, namespaces, and projections.

Framework SDKs for Real Applications

The release also brings v1 framework packages for building streamed agent applications:

Each package exposes the same streaming concepts in the idioms of the framework. React uses hooks, Vue uses composables, Svelte uses reactive helpers, and Angular uses injectors and signals.

The core mental model is shared:

One root hook: useStream or injectStream in Angular
Top-level projections: messages, values, tool calls, interrupts, are available without setup.
Component-level selectors: useMessages, useToolCalls, useExtension, and friends for scoped subscriptions only when something mounts.
Subagents and subgraphs show up immediately: their detailed streams open only when you reach for them.
Remounting on the same thread reattaches to the in-flight run without replay or duplication.

In React, a basic streamed chat can stay small:

const stream = useStream({
  assistantId: "agent",
  apiUrl: "<http://localhost:2024>",
  threadId,
});

const messages = useMessages(stream);

And a subagent-aware component can subscribe only to the data it renders:

function SubagentCard({ stream, subagent }) {
  const messages = useMessages(stream, subagent);
  const toolCalls = useToolCalls(stream, subagent);

  return <AgentTrace messages={messages} toolCalls={toolCalls} />;
}

That is the difference between callback-heavy streaming and render-driven streaming. Components mount the projections they need; the SDK handles subscription lifetimes, reconnection, and assembly.

What Comes Next

This release establishes the foundation: typed events, content blocks, projections, scoped subscriptions, reconnect semantics, framework SDKs, and custom channels.

We are already extending the protocol further. One near-term direction is richer file streaming: modules that let agents stream arbitrary files connected to a backend, such as a sandbox or other managed execution environment. As agents generate and modify more artifacts, those artifacts should flow through the same streaming model as messages, tools, state, and media.

The goal is simple: your application should subscribe only to the agent work on the screen, render it with the right abstraction, and stay connected as runs grow longer and outputs get richer.

Streaming agents should feel like building applications, not parsing logs.

#agent-streams #deep-agents #streaming-primitives #langchain #langgraph #typed-events #multimodal-outputs