Stop Looping Tool Calls: Search as Code Cut Tokens 85% on a 200-CVE Task

In ~7 mins: the CVE result (85% fewer tokens at 100% accuracy), the 5-system benchmark table, the architecture’s 3 layers, the 6 design principles behind it, and a Hermes Agent walkthrough to run the pattern yourself.

Perplexity just stopped letting its agents call search and started letting them program it.

Search as Code (SaC) is its new search architecture. Instead of one fixed pipeline behind a query, the model writes Python that composes the individual pieces of the search stack into a retrieval pipeline built for each task.

On a 200-CVE research task, that cut token use 85.1%, from 288.7K to 42.9K, while scoring 100% accuracy. Every non-Perplexity system tested scored below 25%.

It ships today: default in Perplexity Computer, available in the Agent API. There is a Hermes Agent walkthrough at the end if you want to run the pattern yourself.

The research is published by Perplexity and titled “Rethinking Search as Code Generation.” It landed June 1, 2026, and builds on the first overview of Perplexity’s search stack from September 2025.

Introducing Search as Code, our new search architecture for AI agents. It writes Python that calls our search stack directly, instead of looping through function calls one at a time. Available in the Perplexity Agent API, and now default in Computer.

5:53 PM · Jun 1, 2026 · 462K Views

137 Replies · 174 Reposts · 1.7K Likes

Perplexity’s search serves thousands of queries each second. The old contract was simple: the model issues a query, the engine runs a predefined pipeline, and the model reads the results. That works for a single question.

It breaks for agents. Today’s agents run tasks that take hours, span thousands of retrieval operations, and need a different search strategy at each step. A fixed pipeline cannot bend to all of them. SaC is Perplexity’s answer.

The old way hands the model a search box. It types a query, gets back a ranked page of results, and works with whatever the pipeline decided to return. The model never touches the steps in between.

SaC hands the model the steps themselves. Retrieval, ranking, filtering, fan-out, rendering: each is a primitive in an SDK, and the model writes Python that wires them into a pipeline for the task in front of it. A single inference turn can drive up to thousands of these operations inside a sandbox, then return only the slice worth reading.

The real shift is not that search now uses Python. It is that the agent stops spending its context window on deterministic grunt work. Loops, deduplication, filtering, and joins move into code, where they belong, and the model stays on strategy. Perplexity calls these the twin levers of control and legibility: the model steers every step, and it can inspect the intermediate state instead of guessing at it.

The flagship demo is a security research task: identify and characterize more than 200 high-severity CVEs from 2023 to 2025, each record citing the vendor’s own advisory, the affected product, and the fix version. SaC scored 100% accuracy and used 85.1% fewer tokens than the same stack without it, 42.9K against 288.7K. The other systems Perplexity tested all landed below 25%.

A stylized slice of what the model wrote shows the shape: build a query plan across official advisory formats, fan the queries out concurrently, and keep only vendor-owned pages.

templates = [
    ("Mozilla", 'site:mozilla.org/.../mfsa{year} "CVE-{year}-" "Fixed in" "Impact high"'),
    ("Jenkins", 'site:jenkins.io/security/advisory/{year} "CVE-{year}" "High" "Fix"'),
    # ...more vendors
]
queries = [pattern.format(year=y) for y in (2023, 2024, 2025) for _, pattern in templates]
seed_hits = sdk.search.web_many(queries, limit_per_query=8, concurrency=12)
pages = [h for q, hits in zip(queries, seed_hits) for h in hits
         if official_vendor_advisory(h.url)]

The fan-out, the concurrency, and the filter to official sources all run in code, in one turn, without round-tripping through the model.

Across the full suite, SaC leads four of five benchmarks.

The systems are SaC on Perplexity’s Agent API (GPT 5.5, high reasoning), OpenAI’s Responses API with web_search and code_interpreter, Anthropic Managed Agents (Opus 4.7, high reasoning), Exa, and Parallel.

Each is a single run, not best-of-N. OpenAI edges SaC on HLE by 0.002. The benchmarks come from Google (DeepSearchQA), ByteDance Seed (WideSearch), and OpenAI (BrowseComp), plus Humanity’s Last Exam.

Two caveats live inside these numbers. WANDR, where SaC’s lead stretches to 2.5x the next-best system, is Perplexity’s own benchmark and is not released yet.

The cleanest comparison is the ablation against Perplexity’s own non-SaC pipeline on the same infrastructure: the largest absolute gain is +19.77 points on DSQA, the largest relative gain +45% on WANDR.

On cost, medium-reasoning SaC beats every non-SaC system at under $1 per task, and low-reasoning SaC is cheaper than all of them while staying competitive.

Strip SaC to its frame and you get a stack with three jobs and six design choices that make the stack pay off.

Atomize the stack, don’t wrap the API.

The SDK is not a search endpoint dropped into a shell. Perplexity rearchitected its search stack into modular primitives and exposed them at the lowest level it could, from raw retrieval up to semantic parsing. High-level end-to-end pipelines still exist, but only as shorthand the model can use or skip.

Three layers do three jobs.

The model is the control plane: it reads the directive, decides which pipelines each task needs, and writes the code. The compute sandbox handles deterministic work: control flow, batching, retries, filtering, joins, aggregation. The Agentic Search SDK lives in the sandbox runtime and exposes the primitives, so one inference turn can drive thousands of operations.

Code orchestrates, and fills gaps.

When the SDK lacks a capability, the model builds it in code instead of waiting for a new function. Need a precise regex the query syntax cannot express? The model fans out to collect a superset, dedupes, then narrows the results deterministically. The CVE fan-out above is this principle in action.

Control and legibility are the point.

A fixed pipeline owns everything downstream of the query, which creates three failure modes: context bloated with irrelevant hits, domain knowledge the model cannot apply, and serial control flow that pollutes the context with intermediate state. Programmable search fixes all three by handing the model both the steps and the state.

State lives on disk, not in tokens.

Across turns, SaC persists intermediate state to a filesystem with explicit serialization rather than holding it in a REPL. Perplexity tested both. They performed similarly day to day, but the filesystem approach proved more reliable on long trajectories, where an in-memory namespace turns into a cluttered hundred-cell notebook.

Teach the SDK with small skills.

A custom SDK appears in no model’s pretraining data, so Perplexity wrote Agent Skills to teach it. The root Skill.md files stay under 2,000 tokens and spend most of that budget on few-shot examples for composing primitives, not on listing them. Continuous autoresearch loops tune the SDK and the skills against latency, codegen quality, and task performance.

The catch: the Agentic Search SDK and Perplexity’s retrieval infrastructure are internal. You cannot download them. The appendix at the end turns the idea into a runnable skill anyway, because what travels is the orchestration pattern, not the engine.

Any agent runtime that exposes code execution plus search primitives can host it. The clearest open option is Hermes Agent from Nous Research, MIT-licensed, which gives you execute_code alongside web_search and web_extract.

The move is the one SaC makes: push fan-out, extraction, filtering, deduplication, and evidence assembly into a single sandboxed code step, and return a compact summary to the parent agent.

Full install-to-run commands are in the appendix below.

The SDK is the moat, and it is private.

Every result here rests on the atomized Agentic Search SDK and Perplexity’s retrieval infrastructure, and neither ships. This is an architecture disclosure, not a reproducible package. You can copy the shape of the idea, not the engine that makes its numbers.

The headline leans on a benchmark no one else can see.

WANDR, where SaC’s 2.5x lead is widest, is Perplexity’s own and unreleased. Every score in the table is a single run rather than best-of-N, and none has been independently replicated. Read them as directional.

The gains come with lock-in.

SaC is cloud-only and bound to Perplexity’s models and stack. There is no swapping in your own LLM and no self-hosting, the usual price for production search quality you do not maintain yourself.

What keeps this from being a vendor story is that the core idea is not Perplexity’s alone. Executable-code actions (CodeAct, ICML 2024), the broader move from tool-call loops to generated code, and recent wide-search systems all point the same way. The architecture is sound even where the evidence is self-reported.

So the best recommendation is to adopt the pattern, not wait for the product. Build the code-orchestrated version on an open runtime now, measure it against your current serial setup, and treat Perplexity’s numbers as a target to verify rather than a result to trust.

If your agent could write its own search pipeline, what is the first workflow you would hand it?

All source links are in the first reply. Full breakdown of recent updates + daily signals in our newsletter (link in bio).

This appendix builds a small Hermes skill that approximates the SaC pattern: plan a bounded search, run the deterministic work inside execute_code, persist inspectable artifacts, and return a compact summary.

It is an approximation, not a clone. Hermes exposes higher-level tools than Perplexity’s private SDK, so the value is in moving loops and filtering out of the parent context, not in matching Perplexity’s retrieval quality.

The running example is a migration scout: collect the official migration guidance for a library’s last three major versions, returning for each the source URL, breaking changes, required code edits, and unresolved questions.

1. Install Hermes Agent.

git clone https://github.com/NousResearch/hermes-agent
cd hermes-agent
# follow the quickstart in the docs to install and configure the runtime

Hermes is MIT-licensed and Python-based. Configure web access per the docs before running: web_search and web_extract need search and extraction credentials set first.

2. Create the skill.

mkdir -p ~/.hermes/skills/research/migration-scout

Write ~/.hermes/skills/research/migration-scout/SKILL.md with a contract the model can follow. Keep it under 2,000 tokens, the same discipline Perplexity uses:

---
name: migration-scout
description: Collect official migration guidance for a library's last 3 major versions.
---

# Migration scout

For each of the target library's last three major versions, return: official source URL,
breaking changes, required code edits, and unresolved questions.

Process:
1. Define the output schema before searching (one record per version).
2. Split the work into per-version query branches.
3. Run one scout web_search and inspect the payload before writing an extractor.
4. In execute_code, fan out web_search across branches, web_extract the official docs,
   filter to the vendor's own domain, dedupe by URL, and write artifacts.
5. Persist raw and normalized JSON under sac-state/.
6. Return only counts, artifact paths, unresolved rows, and a small evidence sample.
7. Verify weak rows in a second pass.

3. Scout the payload shape.

Inside execute_code, run one search and look at the response before building an extractor. Do not assume an undocumented schema.

# web_search, web_extract, write_file are available inside execute_code (see docs)
import json
sample = web_search("site:docs.example.com upgrade guide v3", limit=5)
print(json.dumps(sample, indent=2)[:4000])

4. Fan out and persist.

import json

branches = [
    "site:docs.example.com v3 migration breaking changes",
    "site:docs.example.com v2 migration breaking changes",
    "site:docs.example.com v1 to v2 upgrade guide",
]

def urls_in(node):
    out = []
    if isinstance(node, dict):
        for v in node.values():
            out += urls_in(v)
    elif isinstance(node, list):
        for v in node:
            out += urls_in(v)
    elif isinstance(node, str) and node.startswith("http"):
        out.append(node)
    return out

hits = {b: web_search(b, limit=5) for b in branches}
urls = sorted(set(urls_in(hits)))
pages = web_extract(urls[:10]) if urls else {"results": []}

write_file("sac-state/hits.json", json.dumps(hits, indent=2))
write_file("sac-state/pages.json", json.dumps(pages, indent=2))
print(json.dumps({"branches": len(branches), "unique_urls": len(urls)}, indent=2))

5. Verify weak rows.

In a second pass, bind each version’s claims to an official source URL and flag any row that lacks one. Keep this separate from the fan-out so a failed extraction does not poison the whole run.

6. Run, measure, and compare.

Trigger the skill on the bounded task, then capture: parent-visible tool turns, total execute_code calls, search and extract calls, tokens if exposed, rows needing manual verification, and correctness against the official docs. Then run the same task with serial parent-level search calls and compare.

The point is to test whether moving the work into code lowers parent-context load, not to reproduce Perplexity’s benchmark scores.

To deploy, leave the skill in ~/.hermes/skills/, where any Hermes session loads it on demand. The same skeleton ports to other stacks: a Claude Code skill folder, a Codex agent file, or a generic system-prompt slot.

Execution rules.

Keep each run under the documented 50 tool-call default. Do not expect asyncio to parallelize Hermes calls: the RPC stub serializes its exchange behind a calllock.

Write JSON to disk before a long extraction pass. Call delegate_task from the parent agent, never from inside execute_code, which cannot recurse into execute_code, delegate_task, or MCP tools.

Strict mode adds isolation but is not a full secure container, and leaf delegated subagents cannot call execute_code.

A second path: OpenClaw.

OpenClaw (also MIT) offers a second route through its experimental code mode, which is off by default. Enable tools.codeMode.enabled, and guest JavaScript or TypeScript runs in a constrained QuickJS-WASI worker with no imports and no direct network or file access.

It calls already-enabled web tools through the executor and reduces nested results to one compact object, with maxPendingToolCalls defaulting to 16.

Durable artifacts need a write-capable tool. It is a looser fit than Hermes because OpenClaw is built as a multi-channel assistant gateway, so treat it as the secondary option.