Researchers Just Counted 146,932 Hallucinated Citations. This Repo Is the First Installable Fix

After ~10 min reading, you will decide whether to install and use it and how to use every skill immediately.

TLDR? Check this HTML interactive guide (beta), inspired by @trq212

Zhao et al. just counted 146,932 hallucinated citations in 2025’s preprint record (arXiv:2605.07723, 2026-05).

Academic Research Skills is the first installable Claude Code workflow that wires a fix into the paper pipeline itself.

Cheng-I Wu shipped v3.7.0 with a two-command plugin install on May 5, 2026.

The license is CC BY-NC 4.0: source-available, not OSI open source.

Jeremy Nguyen ✍🏼 🚢@JeremyNguyenPhD

Claude Code skills for Academic Research: Edward Wu shares a suite of skills, complete with a 12-agent paper writing workflow, and a 13-agent research team. Github link in the reply below: (also: join me at the online "reading club" to work through implementing different

10:23 AM · Mar 10, 2026 · 107K Views

10 Replies · 106 Reposts · 768 Likes

The repo is authored by Cheng-I Wu (GitHub Imbad0202). It was created on February 26, 2026 and now sits at +6.7k stars.

The intellectual ancestry is named in the README. Methodology is borrowed from PaperOrchestra (Song, Song, Pfister, Yoon, 2026, Google, arXiv:2604.05018). The failure-mode taxonomy comes from Lu et al. (2026, Nature 651:914-919, “The AI Scientist”).

The problem it solves is concrete. Most academic AI workflows live as one-off prompts in private chats. The pipeline from literature search to draft to peer review to citation check to disclosure is rebuilt every time. Academic Research Skills packages that pipeline as four Claude Code skills with mandatory human checkpoints at every stage.

You can skip to “How to get started” Section down below.

The suite is four skills with declared data-access tiers, 25 registered modes, and a 10-stage orchestrated pipeline. Each skill owns part of the workflow.

deep-research ships 13 agents and 7 modes. It runs the upstream investigation: literature review, fact-check, systematic review, Socratic question framing. Data access level is raw. Modes include full, quick, socratic, lit-review, fact-check, systematic-review, and review.

academic-paper ships 12 agents and 10 modes. It handles drafting, revision, citation checks, format conversion, and the AI-disclosure statement. Data access is redacted. Modes include full, plan, outline-only, revision, revision-coach, abstract-only, lit-review, format-convert, citation-check, and disclosure.

academic-paper-reviewer ships 7 agents and 6 modes. It runs multi-perspective peer review with an Editor-in-Chief, three dynamic reviewers, and a Devil’s Advocate. Data access is verified_only. The calibration mode measures the reviewer’s own FNR/FPR against a user-supplied gold set.

academic-pipeline ships 4 agents and orchestrates everything above. It runs a 10-stage flow: research, write, Stage 2.5 integrity check, peer review, revision, re-review (max 2 loops), Stage 4.5 final integrity check, format conversion, final output, and process summary.

Stage 2.5 and Stage 4.5 integrity gates are the load-bearing piece. They run a 7-mode failure-mode checklist grounded in Lu et al.’s enumerated failures: implementation bugs, hallucinated results, shortcut reliance, bug-as-insight reframing, methodology fabrication, frame-lock, and citation hallucinations. The gates block pipeline progression on suspected failures, not silently flag them.

Material Passport is the handoff schema. It carries literature_corpus[] between skills with CSL-JSON authors, year, title, and source pointers back to the user’s own knowledge base. Since v3.6.5, consumers run a corpus-first, search-fills-gap flow: pre-screen the user’s corpus, then search external databases only for the remaining gaps.

v3.7.3 (in progress on main, not yet released) is the direct response to the Zhao et al. audit. That audit covered 111 million references across 2.5 million papers in arXiv, bioRxiv, SSRN, and PMC, found 146,932 hallucinated citations for 2025 alone, and reported that 85.3% of preprint hallucinations survive into the published record. v3.7.3 closes the locator-channel half of the “claim faithfulness” gap the paper named.

The concrete addition is Three-Layer Citation Emission. Every visible citation gets a hidden  marker after the  tag, where <kind> is quote, page, section, paragraph, or none. Quote anchors are capped at 25 words. Emitting none triggers a finalizer hard-gate refusal. The L3 full claim-faithfulness audit lands in v3.8.

Contamination signals are the second v3.7.3 addition. preprint_post_llm_inflection fires when a citation has year >= 2024 and venue is in a closed list of ten preprint servers (arXiv, bioRxiv, medRxiv, SSRN, Research Square, Preprints.org, ChemRxiv, EarthArXiv, OSF Preprints, TechRxiv). semantic_scholar_unmatched fires when the existing Semantic Scholar API protocol returns no match. Both are advisory annotations, not blocking gates.

Plugin install (Claude Code CLI, VS Code, JetBrains, v3.7.0+) takes two commands:

That sets up four skills, three plugin agents, ten /ars-* slash commands, and a SessionStart announce hook. Verify by typing /ars-plan and describing a paper. The skill should open a Socratic dialogue to map chapter structure.

The traditional install path (git clone + symlinks) still works for users on older Claude Code versions or anyone wanting per-project skill control:

Minimum runtime is Claude Code plus an Anthropic API key:

Optional document tooling for DOCX and APA 7.0 PDF output:

For Codex CLI users, the sibling distribution is Imbad0202/academic-research-skills-codex. Same workflow content, Codex-native packaging.

Cost from docs/PERFORMANCE.md: roughly $4 to $6 for a 15,000-word paper with 60 references on Opus 4.7. Cross-model verification (ARS_CROSS_MODEL) adds $0.60 to $1.10. A full run exceeds 200K input and 100K output tokens, so long sessions can lose prompt-cache benefits and need Material Passport resume.

Three entry points cover most real usage.

Full pipeline. Type /ars-full or describe the goal in natural language (”I want to write a research paper on AI’s impact on higher education QA”). The orchestrator starts at Stage 1 and walks all ten stages with user confirmation at every FULL checkpoint. Output is a finished APA 7.0 paper, an Editorial Decision Letter, a Revision Roadmap, two integrity reports, and an AI Self-Reflection Report.

Guided planning. Type /ars-plan when the research question is not yet clear. The Socratic Mentor agent classifies user intent as exploratory or goal-oriented. Exploratory mode disables auto-convergence and runs a chapter-by-chapter dialogue: define the question, choose the method, map the argument. Output is a Chapter Plan plus an INSIGHT collection.

Targeted single-skill calls. Skip the orchestrator when only one function is needed:

A typical first session looks like this. Run /ars-plan to get a chapter map. Then run /ars-lit-review to fill the corpus. Then run /ars-full with the corpus already populated in the Material Passport.

The Material Passport is the handoff between sessions. It carries the literature corpus, the chapter plan, and the integrity reports. To resume a prior run in a fresh Claude Code session, set ARS_PASSPORT_RESET=1 and use the resume_from_passport=<hash> mode.

For an existing draft, type “I already have a paper, review it” to enter the pipeline at Stage 2.5 with the integrity check running first. For reviewer-comment response, type “I received reviewer comments” to enter at Stage 4 with the revision-coach flow.

The pipeline ends with a Process Summary stage: a Collaboration Quality Evaluation across six dimensions scored 1 to 100. That score feeds the AI Self-Reflection Report, which surfaces concession rate, health alerts, and sycophancy risk for the run.

Academic Research Skills is the only candidate in the current academic-Claude-Code-skills cluster with multi-stage integrity gates wired into the pipeline itself.Comparison is architectural, not empirical. No formal benchmarks exist for academic-pipeline tooling.

License friction. CC BY-NC 4.0 blocks commercial use. Source-available, not OSI open source.

Claude Code lock-in. The reference distribution is Claude Code-first. A Codex sibling exists. Cursor, OpenCode, and Gemini are not addressed.

Integrity gates leak. The maintainer’s own post-publication audit of the showcase paper found 21 issues across 68 references that survived three rounds of automated integrity checks. v3.7.3 closes the locator-channel half. The full claim-faithfulness audit is deferred to v3.8.

Tagged release vs main drift. v3.7.0 is the latest tagged release. Three-Layer Citation Emission lives on main as [Unreleased] v3.7.3 work.

Metadata inconsistency. .claude-plugin/plugin.json claims “35+ modes, 32-agent ensemble.” MODE_REGISTRY.md says 25 modes. Direct file count finds 36 agents.

Worth Watching.

Academic Research Skills ships what its README claims. v3.7.0 is the signed release, the plugin assets are in place (.claude-plugin/, 10 /ars-* commands, 3 plugin agents, SessionStart hook), and the static lints pass across spec consistency, schema, and pattern-protection checks. The workflow architecture is the strongest installable response yet to a citation-hallucination paper that just put a corpus-scale number on the problem.

The non-obvious finding is that the maintainer documents his own failure. 21 out of 68 references slipped through three rounds of integrity checks in the showcase audit. That honesty is the strongest evidence the gates do something. It is also why the verdict is not Production Ready.

What changes the verdict: a permissive license (or a commercial tier), the L3 full claim-faithfulness audit shipped in v3.8, and a non-Claude Code reference distribution. Until then, the workflow design is more valuable than the workflow runtime.

PhD students, academic researchers, and lab teams already on Claude Code under noncommercial settings, agent-tooling builders studying how integrity gates wire into multi-stage workflows, and journals or workshops evaluating AI-disclosure schemas.

Commercial SaaS or paid-consulting teams (CC BY-NC 4.0 blocks the build), Cursor-only or OpenCode-only stacks (the reference distribution is Claude Code-first), and anyone needing byte-reproducible citation guarantees (v3.7.3 anchors are advisory in places, L3 full audit is unshipped).

Researchers using Claude Code can now install a 10-stage academic workflow with mandatory integrity gates as four skills, now that v3.7.0 ships a one-line plugin path.

Follow @AlphaSignalAI for more content like this.

Subscribe at AlphaSignal.ai for daily AI signals. Read by 280,000+ developers.

Q: How do you install Academic Research Skills in Claude Code?

A: Two plugin commands: /plugin marketplace add Imbad0202/academic-research-skills then /plugin install academic-research-skills. Requires Claude Code latest and ANTHROPIC_API_KEY. First run: /ars-plan.

Q: Does Academic Research Skills write papers automatically?

A: No. The repo’s POSITIONING.md explicitly states ARS is assistive, not autonomous. Mandatory human checkpoints at every FULL stage and at Stage 2.5 and Stage 4.5 integrity gates block silent progression.

Q: How does ARS reduce citation hallucinations?

A: Two integrity gates (Stage 2.5 pre-review, Stage 4.5 pre-finalization) run a 7-mode failure-mode checklist plus Semantic Scholar API verification. v3.7.3 adds Three-Layer Citation Emission: a hidden anchor marker after every citation specifies quote, page, section, or paragraph locator.

Q: What does ARS cost to run for a full paper pipeline?

A: Roughly $4 to $6 for a 15,000-word, 60-reference paper on Opus 4.7 per docs/PERFORMANCE.md. Cross-model verification adds $0.60 to $1.10. A full run exceeds 200K input and 100K output tokens.

Q: Is Academic Research Skills open source?

A: Source-available, not OSI open source. License is CC BY-NC 4.0 (Creative Commons Attribution-NonCommercial 4.0). Commercial SaaS, hosted services, paid consulting, and enterprise deployments require separate licensing.