Rippling이 Deep Agents와 LangSmith로 6개월 만에 프로덕션 AI를 구축한 방법

How Rippling built production AI in 6 months with Deep Agents and LangSmith

Rippling은 온보딩과 혜택부터 기기 프로비저닝 및 지출 관리까지 모든 것을 관리하는 인력 관리 플랫폼입니다. 이들의 데이터 모델은 HR, IT, 급여, 재무, 글로벌 운영을 아우릅니다: 수천 개의 테이블, 수십만 개의 필드, 그리고 도메인 전체에서 이름을 공유하는 개념들입니다. 이 모든 것을 추론할 수 있는 AI 레이어를 구축하려면 새로운 아키텍처가 필요했습니다.

Rippling AI는 현재 전 세계 수백만 사용자를 대상으로 프로덕션에서 실행 중이며, LangChain Deep Agents와 LangSmith에서 실행됩니다. 팀은 약 6개월 만에 이를 출시했습니다.

문제: 거대한 온톨로지에서의 크로스 도메인 AI

Rippling 사용자들은 여러 도메인 전체에서 많은 것들에 대해 많은 질문을 합니다. "내 잔액은 얼마인가?"는 건강 저축 계좌, 신용 카드, 계약자 결제 계좌, 심지어 휴가 정책에 해당할 수 있는 질문입니다. 관리자는 인원수에 대해 질문한 후 지출 분석으로 전환한 다음 신입 사원의 기기 프로비저닝 상태를 확인할 수 있습니다. Rippling의 AI 레이어는 거대하고 불명확한 표면 전체에서 명확히 구분하고 효과적으로 추론할 수 있어야 합니다.

데이터 모델이 이를 어렵게 만들었습니다. 수천 개의 테이블과 도메인 전체의 겹치는 엔티티 이름으로 인해 LLM에 스키마 청크를 전달하는 것은 작동하지 않습니다. 팀은 컨텍스트에 빠지지 않고 도메인 전체를 빠르게 추론할 수 있는 아키텍처가 필요했습니다.

개별 제품에 AI를 내장하면서 고립된 수직 특화 모델로는 확장할 수 없다는 것이 분명해졌습니다. Rippling의 데이터 모델은 HR, IT, 재무, 글로벌 운영 전체에 걸쳐 수천 개의 테이블을 포함합니다 — 겹치는 엔티티와 컨텍스트에 따라 전혀 다른 의미를 가지는 공유 개념이 있습니다. 우리는 하나의 도메인을 최적화하는 것이 아니라 전체 온톨로지를 명확히 구분하고 작동할 수 있는 AI 네이티브 추론 레이어가 필요했습니다.

— Laks Srini, 제품 소유자, Rippling AI

Rippling AI: Deep Agents + LangSmith에 구축

Rippling이 AI 에이전트를 통합한 속도는 팀이 처음부터 LangChain으로 구축하고, 조합 가능한 에이전트 프리미티브와 LangSmith의 공유 관찰성 레이어를 사용했기 때문에 가능했습니다. Deep Agents가 출시되었을 때, 그들은 Rippling AI의 핵심 추론 루프에 이를 채택했습니다.

Deep Agents가 나오자마자, 정말 강력한 에이전틱 추론 루프를 가지는 것이 우리에게 얼마나 좋을지 보기 위해 그것을 사용하고 싶었습니다. 그것은 우리가 가진 관계와 우리가 신뢰하는 기술의 연장이었습니다.

— Sahin Olut, 수석 엔지니어, Rippling AI

그들이 채택한 아키텍처는 감독자 에이전트가 5~7개의 특화된 하위 에이전트를 조율하고, LangSmith가 추적, 평가 및 프로덕션 모니터링을 처리하는 것입니다.

작동 원리: Deep Agents의 멀티 에이전트 시스템

고객은 Rippling 포털 및 모바일 앱 내의 채팅 인터페이스를 통해 Rippling AI와 상호 작용하지만, 텍스트 상자를 훨씬 능가합니다. 구조화된 데이터는 정렬 가능하고 필터링 가능한 테이블로 렌더링됩니다. 다중 선택 명확화는 선택 UI로 표시됩니다. 작업 확인에는 전용 상호 작용 패턴이 있습니다.

내부적으로 Rippling AI는 멀티 에이전트 시스템입니다. 세 가지 유형의 특화된 Deep Agents는 감독자 에이전트 아래에 있습니다:

읽기 에이전트는 Rippling의 모든 제품 영역(HR, 급여, IT, 재무)과 Salesforce, Carta, GitHub와 같은 연결된 플랫폼 전체에서 구조화된 데이터를 쿼리합니다.
RAG 에이전트는 비구조화된 소스에서 검색합니다: 도움말 센터 문서, 회사 핸드북 및 Rippling에서 호스팅되는 HR 정책 문서입니다.
작업 에이전트는 Rippling 내의 쓰기 작업을 실행합니다. 예를 들어, 보너스 업로드, 직위 및 레벨링 구조 정규화, 또는 이전 직원 프로필에서 미리 채워진 신입 사원 트리거입니다.

감독자 에이전트는 들어오는 쿼리를 분석하고 호출할 특화된 에이전트(또는 조합)를 결정하는 기본 추론 루프를 운영하는 상단에 있습니다.

컨텍스트 엔지니어링: 가장 어려운 문제

Rippling의 복잡성과 규모에서 컨텍스트 엔지니어링은 핵심 기술 과제였습니다. 팀은 이를 해결하기 위해 세 가지 패턴을 개발했습니다.

동적 스킬 주입

Rippling은 Deep Agents 미들웨어를 사용하여 컨텍스트 블로트를 줄입니다. 사용자가 질문을 하면, 검색 단계는 Rippling의 의미론적 레이어를 사용하여 먼저 관련 도메인을 식별한 다음 해당 도메인(급여, 기기, ATS, 지출 등)에 범위가 지정된 스킬을 주입합니다. 재순위자들이 공격적으로 제거하여 컨텍스트 크기를 100배에서 500배로 줄입니다.

전체를 컨텍스트에 넣으면, 그것의 일부라도, 충돌하는 엔티티가 너무 많아서 Rippling의 고객이 예상하는 시간 프레임에 컨텍스트 윈도우에 맞지 않습니다.

— Sahin Olut, 수석 엔지니어

쓰기 작업을 위한 코드 실행

LLM에 데이터를 직접 조작하도록 요청하는 대신, Rippling의 작업 에이전트는 샌드박스된 코드 실행을 사용하여 입력(예: 클라이언트로부터의 CSV)을 Rippling의 내부 도구가 예상하는 형식으로 정규화합니다. 이것은 "무엇을 할 것인가"(LLM 추론)를 "어떻게 형식화할 것인가"(결정론적 코드)로부터 분리하여 데이터 정규화를 신뢰할 수 있고 감사 가능하게 유지합니다.

REPL을 통한 변수 고정

팀의 가장 날카로운 통찰 중 하나는 LLM이 긴 영숫자 ID를 암송할 때 환각하는 것을 보면서 나왔습니다. 그들의 해결책: REPL은 에이전트 단계 간에 런타임 변수 저장소를 유지합니다. 에이전트는 도구 호출 전체에서 원본 엔티티 문자열을 전달하는 대신 명명된 변수를 참조합니다.

LangSmith를 사용한 관찰성 및 Evals

모든 엔지니어가 단일 AI 시스템에서 작업하고 있으므로, 공유 가능하고 쿼리 가능한 추적 저장소는 팀이 협업하는 방식에 필수적입니다.

대규모로 모든 대화를 끌어오고 분석할 수 있는 능력… LangSmith는 그것을 가능하게 합니다. 우리는 그 위에서 실행되는 많은 자동화된 분석을 가지고 있습니다.

— Laks Srini, 제품 소유자

자체 치유 Eval 루프

팀은 회귀를 포착하고 닫는 반자동화된 루프를 구축했습니다. 먼저, 실패한 프로덕션 추적이 LangSmith에서 끌어옵니다. 에이전트는 실패를 분석하고, 수정을 제안하며, 개선을 확인하기 위해 evals를 다시 실행하고, 회귀가 닫힐 때까지 반복합니다. 마지막으로, 인간이 결과 PR을 검토하고 병합합니다.

우리는 실패한 추적을 끌어오고, 에이전트가 무슨 일이 일어나고 있는지 이해하게 하고, 몇 가지 솔루션을 제안하고, 개선되는지 보기 위해 evals를 다시 실행하고, 완료될 때까지 루프합니다. LangSmith는 시스템의 모든 지점에 API가 있기 때문에 이것을 가능하게 합니다.

— Sahin Olut, 수석 엔지니어, Rippling AI

Eval 파이프라인

팀은 모든 결과가 LangSmith에 업로드되는 계층화된 eval 시스템을 실행합니다:

오프라인 evals: 외부 종속성 없이 모든 커밋에서 로컬로 실행되는 사전 기록된 모의 및 고정장치입니다.
병합 후 통합 evals(온라인): 배포 전 시스템 상태를 검증하기 위해 완전한 Rippling 샌드박스(실시간 API 호출)에 대한 300~400개의 쿼리입니다.
배포 차단 evals(온라인): 모든 배포를 게이트하는 실제 시스템에 대한 ~10개의 중요 시나리오입니다.
지속적인 evals(온라인): 프로덕션 데이터에 대한 예약된 실행, 하루에 여러 번, 라이브 시스템 상태 모니터링입니다.

다음: LangSmith를 통한 지속적 개선

전 세계 백만 명 이상이 Rippling AI를 사용하고 있습니다. 모든 대화는 LangSmith를 통해 흐르며, 품질 추적, 사용자 피드백 및 개선의 지속적인 루프를 피드합니다.

복잡하고 권한에 민감한 플랫폼에서 AI를 구축하는 팀의 경우, Rippling 팀의 조언은 직접적입니다:

LLM이 이미 익숙한 시스템을 구축하세요. 에이전트를 동료로 생각하고 그들이 성공할 수 있도록 최고의 도구를 만드세요: 코드 실행을 활성화하고, SQL 작성을 활성화하고, LLM의 세부 사항을 숨기지 마세요. 그리고 빡빡한 자체 디버깅 루프를 가지세요.

— Sahin Olut, 수석 엔지니어

‍

Deep Agents는 Rippling AI 뒤의 추론 프레임워크입니다. 작동 방식 보기 | 문서 읽기

‍

Rippling is a workforce management platform that manages everything from onboarding and benefits to device provisioning and spend management. Their data model spans HR, IT, payroll, finance, and global operations: thousands of tables, hundreds of thousands of fields, and concepts that share names across domains. Building an AI layer that reasons across all of it required a new architecture.

Rippling AI, now in production across million of users globally , runs on LangChain Deep Agents and LangSmith. The team shipped it in roughly 6 months.

The Problem: Cross-Domain AI on a Massive Ontology

Rippling users ask a lot of questions about a lot of things, across many domains. “What’s my balance?” is a question that could pertain to a health savings account, a credit card, a contractor payment account, even a time-off policy. A manager might ask about headcount, pivot to spend analysis, then check a new hire's device provisioning status. Rippling’s AI layer needs to be able to disambiguate and reason effectively across a huge, amorphous surface.

The data model made that difficult. With thousands of tables and overlapping entity names across domains, passing schema chunks to an LLM doesn't work. The team needed an architecture that could quickly reason across domains without drowning in context.

As we embedded AI into individual products, it became clear that siloed, vertical-specific models couldn't scale. Rippling's data model spans thousands of tables across HR, IT, finance, and global ops — with overlapping entities and shared concepts that mean entirely different things depending on context. We needed an AI-native reasoning layer that could disambiguate and operate across that entire ontology, not just optimize for one domain.

— Laks Srini, Product Owner, Rippling AI

Rippling AI: Built On Deep Agents + LangSmith

The speed by which Rippling integrated AI agents was possible because the team built with LangChain from the start, using composable agent primitives and a shared observability layer in LangSmith. When Deep Agents launched, they adopted it for Rippling AI's core reasoning loop.

As soon as Deep Agents came out, we wanted to use it to see how good it would be for us to have a really strong agentic reasoning loop. It was a continuation of the relationship we had and the tech that we trusted.

— Sahin Olut, Principal Engineer, Rippling AI

The architecture they landed on is a supervisor agent coordinating 5 to 7 specialized subagents, with LangSmith handling tracing, evaluations, and production monitoring.

How It Works: A Multi-Agent System of Deep Agents

Customers interact with Rippling AI through a chat interface inside the Rippling portal and mobile app, but it goes well beyond a text box. Structured data renders as sortable, filterable tables. Multi-choice clarifications surface as selection UIs. Action confirmations have dedicated interaction patterns.

Under the hood, Rippling AI is a multi-agent system. Three types of specialized Deep Agents sit beneath a supervisor agent:

Read agents query structured data across all of Rippling's product areas (HR, payroll, IT, finance) and connected platforms like Salesforce, Carta, and GitHub.
RAG agents retrieve from unstructured sources: help center docs, company handbooks, and HR policy documents hosted in Rippling.
Action agents execute write operations within Rippling. For example, uploading bonuses, normalizing job titles and leveling structures, or triggering new hires pre-populated from prior employee profiles.

The supervisor agent sits on top operating the primary reasoning loop that analyzes incoming queries and decides which specialized agent (or combination) to invoke.

Context Engineering: The Hardest Problem

At Rippling’s complexity and scale, context engineering was the core technical challenge. The team developed three patterns to solve it.

Dynamic skill injection

Rippling uses Deep Agents middleware to reduce context bloat. When a user asks a question, a search step uses Rippling's semantic layer to identify the relevant domain first, then injects a skill scoped to that domain (payroll, devices, ATS, spend, etc.). Re-rankers prune aggressively, reducing context size by 100 to 500x.

If you put the whole thing in context, even a chunk of it, there are so many conflicting entities that it just won't fit in the context window in the timeframe Rippling's customers expect.

— Sahin Olut, Principal Engineer

Code execution for write operations

Rather than asking the LLM to manipulate data directly, Rippling’s action agents use sandboxed code execution to normalize inputs (say, a CSV from a client) into the format Rippling's internal tools expect. This separates "what to do" (LLM reasoning) from "how to format it" (deterministic code), keeping data normalization reliable and auditable.

Variable pinning via a REPL

One of the team's sharpest insights came from watching LLMs hallucinate when reciting long alphanumeric IDs. Their fix: a REPL maintains a runtime variable store between agent steps. The agent refers to named variables instead of passing raw entity strings across tool calls.

Observability and Evals with LangSmith

With all engineers working on a single AI system, a shared, queryable trace store is essential to how the team collaborates.

The ability to pull and analyze all conversations at scale… LangSmith makes that possible. We have a bunch of automated analysis running on top of it.

— Laks Srini, Product Owner

Self-Healing Eval Loop

The team built a semi-automated loop that catches regressions and closes them. First, failing production traces get pulled from LangSmith. An agent analyzes the failures, proposes fixes, and re-runs evals to confirm improvement, iterating until regressions close. Finally, a human reviews and merges the resulting PRs.

We pull failing traces, have an agent understand what's going on, propose a few solutions, run the evals again to see if it improves, and loop until it's complete. LangSmith makes this possible because there's an API at every point in the system.

— Sahin Olut, Principal Engineer

The Eval Pipeline

The team runs a layered eval system, with all results uploaded to LangSmith:

Offline evals: Pre-recorded mocks and fixtures that run locally on every commit without external dependencies.
Post-merge integration evals (online): 300 to 400 queries against a full Rippling sandbox (live API calls) to validate system health before deployment.
Deploy-blocking evals (online): ~10 critical scenarios against real systems that gate every deployment.
Continuous evals (online): Scheduled runs against production data, multiple times daily, monitoring live system health.

What's Next: Continuous Improvement With LangSmith

More than one million people using Rippling AI globally. Every conversation flows through LangSmith, feeding a continuous loop of quality tracking, user feedback, and improvement.

For teams building AI on complex, permission-sensitive platforms, the Rippling team's advice is direct:

Build the systems that LLMs are already familiar with. Think of agents as your co-workers and build the best tools for them to be successful: enable code execution, enable writing SQL, don't obscure details from the LLM. And have a tight self-debugging loop.

— Sahin Olut, Principal Engineer

‍

Deep Agents is the reasoning framework behind Rippling AI. See how it works | Read the docs

‍

#production-ai #ai-agents #langsmith #enterprise-ai #workflow-automation #langchain