-
Stop Looping Tool Calls: Search as Code Cut Tokens 85% on a 200-CVE Task
By Perplexity, Search becomes Python the model writes on the fly. The SDK stays private, the pattern doesn’t.
-
As AI agents evolve, we need to look past the RAG pipeline
This article is adapted from Ben Dickson’s AlphaSignal Sunday Deep Dive on Direct Corpus Interaction and GrepSeek.
-
Turn a Departing Engineer's Judgment Into an Editable, Versioned Skill File
COLLEAGUE.SKILL reframes the viral dot-skill repo: expertise as editable, versioned files, not a clone.
-
How Claude Code Harness turns agent coding into a contract-first delivery loop
A workflow plugin, not a magic safety layer.
-
The Model Isn't the Agent Anymore - by AlphaSignal AI
A UC Berkeley paper argues that long-horizon agent performance now turns on six system components around the model, not just the model itself.
-
Perplexity's Bumblebee: a read-only supply-chain check for the developer laptop
A read-only inventory check for the developer laptop supply chain
-
The Third Way to Adapt a Frontier Agent - by AlphaSignal AI
Microsoft just trained an agent’s skill file like neural-network weights, with bounded edits, a held-out gate, and 52-of-52 wins across 6 benchmarks and 3 harnesses.
-
The 5 Principles Every AI Research Stack Now Has to Solve
The first survey covering all 4 phases of AI in academic research, the 5 principles it lands on, and a stage-by-stage map of what’s safe to automate.
-
Spec-Driven Development is the New Default for AI Coding
The 5 repos defining it, the academic case for why, and the practitioner who says the whole movement is wrong.
-
The Three Harness Layers and How to Audit Your Stack
A 100-page survey by UIUC, Meta, and Stanford maps the harness layer that runs Claude Code, Codex, and SWE-agent.
-
How OpenHuman Works, And How to Set It Up in 5 Minutes
The open-source desktop agent that crossed +20k GitHub stars in days, what’s inside, and the full walkthrough.
-
RAG and Long Context Aren't Enough for Agent Memory. δ-mem Is a Third Option
An 8×8 online state lifted Qwen3-4B from 46.79% to 51.66%, with the backbone untouched.
-
11 Open-Source Repos Every AI Infra Engineer Should Bookmark
You built an AI agent this weekend. Have you thought about its infrastructure?
-
Hermes Just Made Codex the Engine and Itself the Shell.
Opt-in beta in Hermes 2026.5. One slash command, three tool sources, four tools left behind.
-
How LLMs Compute the Right Answer, Then Match the Swarm’s Wrong One, and How to Wire Around It
A single peer auditor dropped GPT-5.4 from 98% to 10% across 22,500 Waterloo trajectories.
-
Researchers Just Counted 146,932 Hallucinated Citations. This Repo Is the First Installable Fix
Academic Research Skills: 4 Claude Code skills, 25 modes, two integrity gates, CC BY-NC 4.0
-
How agentmemory works, and how to actually use it to with your agent
Trending, 12 hooks, 51 MCP tools, and a triple-stream retrieval pipeline that scores 95.2% R@5 on LongMemEval-S
-
How AI Agents Follow Senior-Engineer Production Workflows, How to Wire It Into Your Stack
22 Markdown skills, 7 slash commands, and the author bet that the harness matters more than the model.
-
When AI agents learn to engineer themselves
A primer on self-improving agents: Moving beyond the human-coded harness
-
You Should Install Hermes Agent This Weekend
Cheap 1M-context models changed the model layer. Claude Code and Codex changed the coding layer. Hermes is starting to look like the runtime layer.