AI News · alphasignal

AlphaSignal · 21시간 전

Stop Looping Tool Calls: Search as Code Cut Tokens 85% on a 200-CVE Task

By Perplexity, Search becomes Python the model writes on the fly. The SDK stays private, the pattern doesn’t.

AlphaSignal · 21시간 전

As AI agents evolve, we need to look past the RAG pipeline

This article is adapted from Ben Dickson’s AlphaSignal Sunday Deep Dive on Direct Corpus Interaction and GrepSeek.

AlphaSignal · 1일 전

Turn a Departing Engineer's Judgment Into an Editable, Versioned Skill File

COLLEAGUE.SKILL reframes the viral dot-skill repo: expertise as editable, versioned files, not a clone.

AlphaSignal · 4일 전

How Claude Code Harness turns agent coding into a contract-first delivery loop

A workflow plugin, not a magic safety layer.

AlphaSignal · 5일 전

The Model Isn't the Agent Anymore - by AlphaSignal AI

A UC Berkeley paper argues that long-horizon agent performance now turns on six system components around the model, not just the model itself.

AlphaSignal · 6일 전

Perplexity's Bumblebee: a read-only supply-chain check for the developer laptop

A read-only inventory check for the developer laptop supply chain

AlphaSignal · 7일 전

The Third Way to Adapt a Frontier Agent - by AlphaSignal AI

Microsoft just trained an agent’s skill file like neural-network weights, with bounded edits, a held-out gate, and 52-of-52 wins across 6 benchmarks and 3 harnesses.

AlphaSignal · 8일 전

The 5 Principles Every AI Research Stack Now Has to Solve

The first survey covering all 4 phases of AI in academic research, the 5 principles it lands on, and a stage-by-stage map of what’s safe to automate.

AlphaSignal · 11일 전

Spec-Driven Development is the New Default for AI Coding

The 5 repos defining it, the academic case for why, and the practitioner who says the whole movement is wrong.

AlphaSignal · 12일 전

The Three Harness Layers and How to Audit Your Stack

A 100-page survey by UIUC, Meta, and Stanford maps the harness layer that runs Claude Code, Codex, and SWE-agent.

AlphaSignal · 13일 전

How OpenHuman Works, And How to Set It Up in 5 Minutes

The open-source desktop agent that crossed +20k GitHub stars in days, what’s inside, and the full walkthrough.

AlphaSignal · 13일 전

RAG and Long Context Aren't Enough for Agent Memory. δ-mem Is a Third Option

An 8×8 online state lifted Qwen3-4B from 46.79% to 51.66%, with the backbone untouched.

AlphaSignal · 2026-05-19

11 Open-Source Repos Every AI Infra Engineer Should Bookmark

You built an AI agent this weekend. Have you thought about its infrastructure?

AlphaSignal · 2026-05-18

Hermes Just Made Codex the Engine and Itself the Shell.

Opt-in beta in Hermes 2026.5. One slash command, three tool sources, four tools left behind.

AlphaSignal · 2026-05-15

How LLMs Compute the Right Answer, Then Match the Swarm’s Wrong One, and How to Wire Around It

A single peer auditor dropped GPT-5.4 from 98% to 10% across 22,500 Waterloo trajectories.

AlphaSignal · 2026-05-14

Researchers Just Counted 146,932 Hallucinated Citations. This Repo Is the First Installable Fix

Academic Research Skills: 4 Claude Code skills, 25 modes, two integrity gates, CC BY-NC 4.0

AlphaSignal · 2026-05-13

How agentmemory works, and how to actually use it to with your agent

Trending, 12 hooks, 51 MCP tools, and a triple-stream retrieval pipeline that scores 95.2% R@5 on LongMemEval-S

AlphaSignal · 2026-05-12

How AI Agents Follow Senior-Engineer Production Workflows, How to Wire It Into Your Stack

22 Markdown skills, 7 slash commands, and the author bet that the harness matters more than the model.

AlphaSignal · 2026-05-12

When AI agents learn to engineer themselves

A primer on self-improving agents: Moving beyond the human-coded harness

AlphaSignal · 2026-05-11

You Should Install Hermes Agent This Weekend

Cheap 1M-context models changed the model layer. Claude Code and Codex changed the coding layer. Hermes is starting to look like the runtime layer.