How to Build a Karpathy-Style LLM Wiki?
A recipe-style guide to letting Claude Code compile your reading list into a personal knowledge base that maintains itself — and the two ways it goes wrong.
Ingredients
- An agentic LLM — Claude Code here, but Codex, OpenCode, or Cursor work the same way
- Obsidian — free, the reading UI for the wiki
- The Obsidian Web Clipper browser extension — for getting articles into the wiki in one click
git— your undo button when the agent rewrites a page badly- A topic you actually want to think about for more than a weekend
In April 2026, Andrej Karpathy published a 1,200-word gist called llm-wiki.md describing a pattern, not a tool: drop a markdown schema into a folder, let an LLM agent compile your sources into an interlinked wiki, and never write the wiki yourself. The tweet hit 16M views. The gist crossed 5,000 stars in days. Dozens of implementations followed. This guide is the shortest path from zero to a working wiki on your own machine — plus the two failure modes the quickstarts skip over.
Karpathy’s framing of the workflow is worth keeping in mind:
1 Lay down the three folders
The pattern has three layers, no more: immutable sources, an LLM-owned wiki, and a schema file that tells the LLM how the other two relate. Create an Obsidian vault for it and drop in this skeleton:
~/Obsidian/wiki/
├── CLAUDE.md # the schema — step 2
├── raw/ # immutable inputs: articles, PDFs, transcripts
│ └── assets/ # images from Web Clipper land here
├── wiki/ # LLM-owned: every file here is written by Claude
│ ├── index.md # catalog of every wiki page
│ └── log.md # append-only chronological record
└── .git/ # git init this on day one
Open the folder in Obsidian (File → Open Vault → this directory) and in a terminal alongside. Then:
cd ~/Obsidian/wiki
git init
echo ".obsidian/workspace*" > .gitignore
git add . && git commit -m "empty wiki"
raw/ and wiki/
as siblings, not nested. The agent reads from one and writes to
the other; the symmetry matters when you write rules about which
it can touch.
2 Write the CLAUDE.md schema
This is the most important file in the system. It is what makes
Claude a disciplined wiki maintainer instead of a generic
chatbot. Save the following as CLAUDE.md at the root
of the vault. It is opinionated and deliberately short — you
will evolve it with the agent as you figure out what works for
your domain.
# Wiki Schema
You maintain a personal wiki at `wiki/` compiled from immutable
sources at `raw/`. The human curates `raw/` and asks questions.
You write and maintain everything in `wiki/`.
## Folder layout
raw/ immutable source documents (articles, PDFs, transcripts)
raw/assets/ images downloaded by Obsidian Web Clipper
wiki/ you own this — every file here was written by you
wiki/index.md catalog of all wiki pages, organised by type
wiki/log.md append-only chronological record of every operation
## Page types
Every page in wiki/ starts with YAML frontmatter:
---
title:
type: source | entity | concept | comparison | synthesis
sources: [list of paths under raw/ this page draws from]
related: [list of wiki/ pages this connects to]
created: YYYY-MM-DD
updated: YYYY-MM-DD
---
- source a summary of one document in raw/
- entity a person, model, organisation, product, dataset
- concept an idea, technique, term, ongoing debate
- comparison a side-by-side of two or more entities or concepts
- synthesis an answer or analysis filed back from a query
Use [[wikilinks]] for cross-references so Obsidian's graph view renders.
## Operations
### /ingest <path-or-url>
1. Put the source under raw/ (fetch it if a URL was given).
2. Discuss the key takeaways with me in chat first.
3. Write a source-type page summarising the document.
4. Update wiki/index.md with the new entry.
5. For every entity, concept, or claim in the source that overlaps
with an existing wiki page, update that page. Add new claims,
flag contradictions, strengthen or revise the synthesis.
6. Create new entity or concept pages for important things mentioned
but not yet covered. Cross-link.
7. A single ingest typically touches 10–15 wiki pages.
8. Append one line to wiki/log.md:
## [YYYY-MM-DD] ingest | <source title>
### /query <question>
1. Read wiki/index.md first to locate candidate pages.
2. Read the relevant pages.
3. Synthesise an answer with [[wikilinks]] back to the wiki.
4. Every factual claim must trace back to a source-type page or
directly to a file in raw/. Never cite a synthesis page as if
it were a source.
5. If the answer is non-trivial, file it as a synthesis page and
update the index.
6. Append to wiki/log.md:
## [YYYY-MM-DD] query | <question>
### /lint
1. Contradictions between pages.
2. Stale claims newer sources have superseded.
3. Orphan pages with no inbound links.
4. Concepts mentioned across pages but lacking their own page.
5. Missing cross-references.
6. Data gaps worth filling with a web search — propose them.
7. Report findings. Do not auto-fix without my confirmation.
8. Append to wiki/log.md:
## [YYYY-MM-DD] lint | N issues
## Hard rules
- You never modify anything under raw/.
- You never invent a source. Unsourced claims must be marked [unsourced].
- A synthesis page is not a source. Never list a synthesis page in
another page's `sources:` field.
- Prefer updating an existing page to creating a new one.
- Commit to git after every ingest, every filed query, and every lint.
That is the whole system. Karpathy’s gist is intentionally abstract about exactly which page types and which commands to use — the version above is one concrete instantiation that works. Edit it freely as you learn what your domain needs.
3 Set up Obsidian as the reading side
Open Obsidian on one half of the screen and Claude Code on the other. Claude edits the markdown; Obsidian renders it, shows the graph, follows the wikilinks. Two settings are worth turning on up front:
-
Install the Web Clipper browser extension. It
converts any article to markdown and saves it into your vault
in one click. Configure it to drop clipped articles into
raw/. -
In Settings → Files and links, set
Attachment folder path to
raw/assets/. Then in Settings → Hotkeys, search for “Download attachments for current file” and bind it toCtrl+Shift+D. After clipping an article, hit the hotkey and every referenced image gets pulled to local disk so the LLM can read them later.
Open Obsidian’s graph view once you have a handful of pages — it is the fastest way to spot orphan pages and hub concepts. The graph is what makes a wiki feel like a wiki instead of a folder of notes.
4 Ingest your first source
From inside the vault, start Claude Code:
cd ~/Obsidian/wiki
claude
Claude reads CLAUDE.md automatically. Clip an article
into raw/ with the Web Clipper, then in the Claude
session type:
/ingest raw/articles/some-article-you-just-clipped.md
The agent will read it, discuss takeaways with you, write a
source page, update index.md, create or
update entity and concept pages, and append a line to
log.md. Watch the Obsidian pane refresh as files
land. Commit when you are happy with the result:
git add . && git commit -m "ingest: some-article"
5 Query the wiki
The wiki only pays off when you start asking questions against it. Inside the same Claude session:
/query Where do my sources disagree about <X>,
and what is the strongest single piece of evidence
on either side?
The agent reads index.md first, pulls the relevant
pages, and answers with [[wikilinks]] back to the
wiki and citations down to raw/. If the answer is
non-trivial — a comparison you will want again, a thread
you will pull on later — the schema tells it to save the
answer as a synthesis page. That is how your
explorations compound instead of disappearing into chat history.
Karpathy on why this matters:
6 Lint every week or so
As the wiki grows past 30 or 40 pages, drift sets in: contradictions slip through, pages go orphaned, a concept ends up mentioned in five places but never has its own page. Schedule a lint pass roughly weekly:
/lint
The agent walks the wiki and reports problems — it does not auto-fix, because that is exactly how poisoning starts (see below). Read the report, pick what is worth fixing, and ask the agent to do those specific repairs. Commit after.
7 Add real search when the index breaks
Karpathy’s observation is that
index.md alone scales surprisingly well — up to
roughly 100 sources and a few hundred pages. Past that, the agent
starts loading too much context per query and answers get sloppy.
When you hit the wall, drop in
qmd
— a local hybrid BM25 + vector search engine for markdown,
with both a CLI and an MCP server. The agent can shell out to it,
or use it as a native tool. Until then, the index file is
enough.
The two ways this goes wrong
Most write-ups about Karpathy’s pattern stop at the happy path. The community spent April and May 2026 finding the failure modes. Two are worth knowing before you put real reading hours into a wiki.
1. Wiki poisoning
The danger, named most clearly by Anand Lahoti in The Hidden Flaw in Karpathy’s LLM Wiki: an LLM writes a synthesis page, the synthesis page gets referenced by the next query, the agent reasons on top of its own prior summaries instead of the originals, and the chain of custody to the source quietly breaks. Every individual answer still looks fine. The drift only shows up when you go back to check.
The mitigation is the rule in the CLAUDE.md above:
a synthesis page is not a source. Synthesis pages can
related:-link to each other, but they can never
appear in another page’s sources: field. Every
sources: entry must resolve to something under
raw/. Grep your wiki periodically to make sure the
agent has not cheated:
grep -r "sources:" wiki/ | grep -v "raw/"
2. Cognitive outsourcing
The wiki looks organised. You never internalised any of it. This is the most common HN critique and it is fair. The wiki is a search index over your reading, not a replacement for actually having read the thing. The tell: you cannot summarise what you ingested last week without opening the wiki. If that happens for more than a couple of weeks running, slow down ingest, write synthesis pages yourself for the things you care most about, and use the agent strictly to maintain the bookkeeping around what you wrote.
One HN commenter put the smaller version of the same worry bluntly:
claude.md let alone a whole wiki.”
That is overstated — the pattern clearly works at the scale Karpathy describes (~100 sources, ~400k words in his own wiki). But it is a useful sanity check: if the agent cannot reliably follow your schema on small examples, scaling up will not save you.
What you end up with
A folder of 50–200 interlinked markdown pages your agent maintains, versioned in git, rendered in Obsidian, grep-able from the terminal, model-agnostic, and free of vendor lock-in. The compounding effect is real: by the fortieth source the agent stops asking you basic questions because it has read the previous thirty-nine. And you have a concrete safeguard against the failure mode nobody warned you about.
Further reading
- Karpathy —
llm-wiki.md(the original gist) - Karpathy — the launch tweet
- Karpathy — endorsing Farzapedia, an early implementation
- Anand Lahoti — The Hidden Flaw in Karpathy’s LLM Wiki
- Joi Ito — What I learned from Karpathy’s LLM Wiki
- WenHao Yu — a Zettelkasten user’s honest review
- Astro-Han/karpathy-llm-wiki — an Agent Skills implementation
- tobi/qmd — local search for the wiki when the index file is no longer enough
- Hacker News — Beyond Karpathy’s LLM Wiki: Cognitive Governance