LangSmith 샌드박스가 정식 출시되었습니다

LangSmith Sandboxes are Generally Available

오늘, LangSmith Sandboxes가 일반 공개됨: 에이전트 코드 실행을 위해 구축된 안전하고 확장 가능한 환경이며, Deep Agents SDK 및 LangSmith 플랫폼과 통합되어 있습니다.

각 샌드박스는 하드웨어 가상화된 microVM이며, 커널 수준에서 귀하의 서비스 및 다른 샌드박스로부터 격리됩니다. 샌드박스는 LangSmith의 나머지 부분과 동일한 SDK 및 API 키를 사용하며 모든 프레임워크 또는 사용자 정의 코드와 작동합니다.

LangSmith Sandboxes 시도

에이전트가 샌드박스가 필요한 이유는?

지난 1년간, 새로운 클래스의 에이전트들이 코드 실행을 핵심 워크플로우의 일부로 사용하기 시작했습니다. Cursor, Claude Code, OpenSWE, Deep Agents와 같은 시스템들은 사전 정의된 도구만 호출하지 않습니다. 코드를 생성하고, 의존성을 설치하고, 테스트를 실행하고, 실패를 검사하고, 파일을 편집합니다.

코드 실행이 필요한 몇 가지 일반적인 워크로드:

응답하기 전에 자신의 출력을 실행하고 검증하는 코딩 어시스턴트
리포지토리를 복제하고, 의존성을 설치하고, 테스트를 실행하고, PR을 여는 CI 스타일 에이전트 (OpenSWE 같은)
데이터 세트에 대해 Python을 실행하는 데이터 분석 에이전트

이 에이전트들은 파일 시스템, 패키지 관리자, 셸 및 영구적 상태를 가진 컴퓨터 같은 환경이 필요합니다. 또한 실행하는 코드가 모델에 의해 생성되거나, 외부 의존성에서 가져오거나, 사용자가 제공할 수 있으므로 격리가 필요합니다.

대부분의 팀은 노트북에서 이를 실행하여 시작합니다. 이는 프로토타입으로는 작동하지만 프로덕션에서는 문제가 됩니다.

에이전트 코드에는 강력한 격리가 필요합니다

실제 격리 경계 외에서 에이전트 코드를 실행하는 위험은 이론적이지 않습니다:

공급망 공격이 런타임에 도달할 수 있습니다: 2025년 9월에, 자기 복제 Shai-Hulud npm 웜이 @ctrl/tinycolor를 포함한 500개 이상의 패키지에 백도어를 설치했으며, 테스트 실행 전에 preinstall에서 실행되었습니다. 11월의 두 번째 물결은 796개의 추가 패키지 (주간 2,000만+ 다운로드)와 25,000개 이상의 GitHub 리포지토리를 몇 시간 내에 영향을 주었습니다.
\"Sandbox\" 기능이 항상 샌드박스인 것은 아닙니다: n8n은 하루에 6개의 RCE CVE가 공개되었으며, CVE-2026-1470 (CVSS 9.9) JS 표현식 샌드박스를 우회하는 것과 CVE-2026-0863 Python 작업 실행자를 벗어나는 것을 포함합니다. JS eval 경계는 격리가 아닙니다.
컨테이너는 커널을 공유하며, 커널이 손상됩니다: Copy Fail (CVE-2026-31431)는 커널 암호화 API를 통해 2017년부터 모든 주요 Linux 배포판을 루트 처리하는 732바이트 Python 스크립트입니다. AI 도구가 약 1시간 내에 이를 드러냈습니다. 컨테이너는 호스트와 커널을 공유하므로 여기서 도움이 될 수 없으며, 잘못된 스크립트를 실행하는 에이전트는 탈출합니다.

컨테이너는 에이전트 워크로드용으로 설계되지 않았습니다. 이들은 고정된 작업을 처리하고 사라지는 웹 서버와 같이 알려진 검증된 애플리케이션 코드를 상태 없이 실행하도록 설계되었습니다. 에이전트는 정반대입니다. 패키지를 설치하고, 파일을 편집하고, 장기 실행 작업 스레드를 따르고, 중단한 곳으로 돌아올 수 있는 상태 있는 작은 컴퓨터를 원합니다. 그리고 실행하는 코드는 정의상 신뢰할 수 없습니다. LangSmith Sandboxes는 그 실행 모델을 위해 구축되었습니다.

LangSmith Sandboxes

LangSmith Sandboxes는 에이전트에 인프라를 위험에 빠뜨리지 않고 사용할 수 있는 컴퓨터 같은 환경을 제공합니다. 각 샌드박스는 자신의 파일 시스템, 셸, 패키지 관리자 및 네트워크 경계를 가진 임시 microVM으로 실행됩니다. 에이전트는 코드를 작성하고, 의존성을 설치하고, 테스트를 실행하고, 장기 실행 세션에서 계속 작업할 수 있으며, 샌드박스는 서비스와 다른 샌드박스로부터 격리된 상태로 유지됩니다.

이들은 팀이 이미 사용 중인 동일한 LangSmith SDK 및 API 키를 통해 관리되므로, 런타임 레이어를 직접 구축하지 않고도 에이전트 워크플로우에 보안 코드 실행을 추가할 수 있습니다. Sandboxes는 Deep Agents, Open SWE, LangSmith Deployment, LangSmith Fleet 및 사용자 정의 코드와 함께 작동합니다. 또한 팀이 자격 증명, 리소스 제한, 라이프사이클 및 액세스 주변에서 필요한 프로덕션 제어를 포함하며, GA는 병렬 워크로드, 스냅샷팅 및 엔터프라이즈 보안을 위한 새로운 기능을 추가합니다.

GA 릴리스와 함께 새로운 기능

스냅샷 및 저렴한 포크: 실행 중인 샌드박스를 캡처하거나 Docker 이미지에서 구축한 다음 그것으로부터 새 샌드박스를 부팅합니다. 포크는 copy-on-write를 통해 상태를 공유하므로 10개의 병렬 분기를 회전하는 것은 1개와 거의 같은 비용입니다. 에이전트가 잘못된 경로로 가면 복원하고 다른 분기를 시도할 수 있습니다.
비활성 시 일시 중지: 유휴 샌드박스는 자동으로 일시 중지되므로 아무것도 하지 않는 리소스에 대해 비용을 지불하지 않습니다.
서비스 URL: 샌드박스 내부에서 실행 중인 모든 항목에 대한 인증된 HTTP 액세스. 브라우저에서 샌드박스 호스팅 미리보기를 열거나, 스크립트에서 실행하거나, URL을 팀원과 공유합니다. 포트 포워딩이 필요하지 않습니다.
Sandbox CLI: Dockerfile에서 스냅샷을 구축하고, 샌드박스를 관리하고, 대화형 콘솔을 열고, 원본 TCP를 터널링하고, 모든 Linux 상자처럼 샌드박스에 대해 표준 도구 (ssh, scp, rsync, sftp)를 사용합니다.
기본적으로 생성자 비공개: Sandboxes는 생성자 특정 인증과 함께 제공되므로 샌드박스를 시작한 사용자 (및 워크스페이스 관리자)만 셸로 들어가거나 Service URLs를 열 수 있습니다. 공유할 준비가 되면 다른 워크스페이스 멤버에게 액세스 권한을 부여합니다.
사용자 정의 콜백이 있는 Auth Proxy: 샌드박스의 아웃바운드 요청은 네트워크 레이어에서 자격 증명을 주입하는 프록시를 통과하므로 보안은 런타임에 닿지 않습니다. GA의 새로운 기능: 콜백을 사용하면 고급 설정 (테넌트당 토큰, 보안 저장소 조회, 감사 후크)에 대한 사용자 정의 보안 해결을 플러그인할 수 있습니다. 또한 액세스 경계를 제어하기 위해 도메인을 화이트리스트/블랙리스트로 지정합니다.

팀이 Sandboxes를 사용하는 방법

Sandboxes는 이미 팀들이 질문에 답변하는 에이전트에서 안전하게 작업할 수 있는 에이전트로 이동하는 것을 도와주고 있습니다. monday.com에서, 이는 Sidekick에 더 고급 사용자 워크플로우를 위한 코드를 작성하고 실행할 수 있는 보안 환경을 제공하는 것을 의미합니다.

LangSmith Sandboxes는 우리의 AI 어시스턴트인 Sidekick을 monday.com 사용자를 위해 훨씬 더 능력 있게 만드는 데 도움을 주고 있습니다. 보안 환경으로, Sidekick은 코드를 작성하고 실행할 수 있으며, 그 결과를 사용하여 데이터 분석 실행 및 멀티미디어 생성과 같이 더 풍부한 워크플로우를 만들 수 있습니다.

- Omri Bruchim, AI Platform Group Manager, monday.com

다음에 올 것

로컬-클라우드 에이전트. 노트북의 샌드박스에 대해 에이전트를 개발한 다음, 코드 변경 없이 동일한 에이전트를 클라우드 호스팅 샌드박스로 승격합니다.
공유 볼륨 에이전트가 협력할 수 있도록 합니다. Agent 1이 볼륨에 쓰면, Agent 2가 중단한 곳부터 시작합니다.
볼륨 마운트. 시작 시 즉시 액세스를 위해 고유의 Blob 저장소 또는 git 리포지토리를 마운트합니다.
완전한 실행 추적 VM 내부의 모든 프로세스 및 네트워크 호출의 경우, 감사 로그로 두배 역할합니다.

Slack 커뮤니티에 참여하여 워크플로우에 가장 중요한 것을 공유합니다.

시작하기

기존 SDK 및 API 키로 한 줄의 코드로 LangSmith Sandboxes 사용을 시작할 수 있습니다.

LangSmith Sandboxes 시도 또는 문서를 읽습니다.

Today, LangSmith Sandboxes are Generally Available: secure, scalable environments built for agent code execution, and integrated with the Deep Agents SDK and the LangSmith platform.

Each sandbox is a hardware-virtualized microVM, kernel-isolated from your services and from other sandboxes. Sandboxes use the same SDK and API key as the rest of LangSmith and work with any framework or custom code.

Try LangSmith Sandboxes

Why do agents need sandboxes?

Over the past year, a new class of agents has started to use code execution as part of their core workflow. Systems like Cursor, Claude Code, OpenSWE, and Deep Agents don’t just call predefined tools. They generate code, install dependencies, run tests, inspect failures, and edit files.

A few common workloads that need code execution:

A coding assistant that runs and validates its own output before responding
A CI-style agent that clones a repo, installs deps, runs tests, and opens a PR (like OpenSWE)
A data analysis agent that runs Python against a dataset

These agents need a computer-like environment with a filesystem, package manager, shell, and persistent state. They also need isolation, because the code they run may be generated by a model, pulled from an external dependency, or supplied by a user.

Most teams start by running this on a laptop. That works for a prototype, but it breaks down in production.

Agent code needs strong isolation

The risks of running agent code outside a real isolation boundary aren't theoretical:

Supply-chain attacks can reach into your runtime: In September 2025, the self-replicating Shai-Hulud npm worm backdoored 500+ packages including @ctrl/tinycolor, executing in preinstall before any tests ran. A second wave in November hit 796 more packages (20M+ weekly downloads) and 25,000+ GitHub repos in hours.
"Sandbox" features aren't always sandboxes: n8n had six RCE CVEs disclosed in a single day, including CVE-2026-1470 (CVSS 9.9) bypassing the JS expression sandbox and CVE-2026-0863 breaking out of the Python task executor. A JS eval boundary is not isolation.
Containers share a kernel, and kernels break: Copy Fail (CVE-2026-31431) is a 732-byte Python script that roots every major Linux distribution back to 2017 via the kernel crypto API. AI tooling surfaced it in about an hour. Containers can't help here because they share a kernel with the host, so an agent running the wrong script escapes.

Containers weren’t built for agent workloads. They’re designed to run known, vetted application code statelessly, such as a web server that handles fixed operations and disappears. Agents are the opposite. They want stateful little computers where they can install packages, edit files, follow long-running threads of work, and come back to where they left off. And the code they run is untrusted by definition. LangSmith Sandboxes are built for that execution model.

LangSmith Sandboxes

LangSmith Sandboxes give agents a computer-like environment they can use without putting your infrastructure at risk. Each sandbox runs as an ephemeral microVM with its own filesystem, shell, package manager, and network boundary. Agents can write code, install dependencies, run tests, and keep working across long-running sessions, while the sandbox stays isolated from your services and from other sandboxes.

They’re managed through the same LangSmith SDK and API key teams already use, so you can attach secure code execution to an agent workflow without building the runtime layer yourself. Sandboxes work with Deep Agents, Open SWE, LangSmith Deployment, LangSmith Fleet, and custom code. They also include the production controls teams need around credentials, resource limits, lifecycle, and access, with GA adding new capabilities for parallel workloads, snapshotting, and enterprise security.

New features with the GA release

Snapshots and cheap forks: Capture a running sandbox, or build one from a Docker image, then boot new sandboxes from it. Forks share state via copy-on-write, so spinning up ten parallel branches costs about the same as one. When your agent goes down a wrong path, you can restore and try a different branch.
Pause when inactive: Idle sandboxes pause automatically, so you don’t pay for resources that are doing anything.
Service URLs: Authenticated HTTP access to anything running inside a sandbox. Open a sandbox-hosted preview in a browser, hit it from a script, or share the URL with a teammate. No port forwarding needed.
Sandbox CLI: Build snapshots from Dockerfiles, manage sandboxes, open interactive consoles, tunnel raw TCP, and use standard tools (ssh, scp, rsync, sftp) against a sandbox like any Linux box.
Creator-private by default: Sandboxes ship with creator-specific auth, so only the user who launched a sandbox (and workspace admins) can shell into it or open its Service URLs. Grant access to other workspace members when you are ready to share.
Auth Proxy with custom callbacks: Outbound requests from a sandbox flow through a proxy that injects credentials at the network layer, so secrets never touch the runtime. New in GA: callbacks let you plug in custom secret resolution for advanced setups (per-tenant tokens, vault lookups, audit hooks). Also allowlist/denylist domains to control your access boundary.

How teams are using Sandboxes

Sandboxes are already helping teams move from agents that answer questions to agents that can do work safely. At monday.com, that means giving Sidekick a secure environment to write and run code for more advanced user workflows.

LangSmith Sandboxes are helping us make our Sidekick, our AI assistant, much more capable for monday.com users. With secure environments, Sidekick can write and run code, and use the results to create richer workflows, like running data analysis and generating multimedia.

- Omri Bruchim, AI Platform Group Manager, monday.com

What's Coming Next

Local-to-cloud agents. Develop an agent against a sandbox on your laptop, then promote the same agent to a cloud-hosted sandbox with no code changes.
Shared volumes so agents can collaborate. Agent 1 writes to a volume, then Agent 2 picks up where it left off.
Volume Mounts. Mount your own blob storage or git repository for instant access on startup.
Full execution tracing of every process and network call inside the VM, doubling as an audit log.

Join our Slack community to share what matters most for your workflows.

Get Started

You can start using LangSmith Sandboxes with one line of code with your existing SDK and API key.

Try LangSmith Sandboxes or read the docs.

#ai-agents #sandboxes #kernel-isolation #langsmith #code-execution #data-pipelines