How to Build a Karpathy-Style LLM Wiki?

A recipe-style guide to letting Claude Code compile your reading list into a personal knowledge base that maintains itself — and the two ways it goes wrong.

Prep time About 30 minutes

One-time cost Free

Going cost Your Claude / Codex subscription

Ingredients

An agentic LLM — Claude Code here, but Codex, OpenCode, or Cursor work the same way
Obsidian — free, the reading UI for the wiki
The Obsidian Web Clipper browser extension — for getting articles into the wiki in one click
git — your undo button when the agent rewrites a page badly
A topic you actually want to think about for more than a weekend

In April 2026, Andrej Karpathy published a 1,200-word gist called llm-wiki.md describing a pattern, not a tool: drop a markdown schema into a folder, let an LLM agent compile your sources into an interlinked wiki, and never write the wiki yourself. The tweet hit 16M views. The gist crossed 5,000 stars in days. Dozens of implementations followed. This guide is the shortest path from zero to a working wiki on your own machine — plus the two failure modes the quickstarts skip over.

Karpathy’s framing of the workflow is worth keeping in mind:

Karpathy: “Obsidian is the IDE; the LLM is the programmer; the wiki is the codebase.”

1 Lay down the three folders

The pattern has three layers, no more: immutable sources, an LLM-owned wiki, and a schema file that tells the LLM how the other two relate. Create an Obsidian vault for it and drop in this skeleton:

~/Obsidian/wiki/
├── CLAUDE.md         # the schema — step 2
├── raw/              # immutable inputs: articles, PDFs, transcripts
│   └── assets/       # images from Web Clipper land here
├── wiki/             # LLM-owned: every file here is written by Claude
│   ├── index.md      # catalog of every wiki page
│   └── log.md        # append-only chronological record
└── .git/             # git init this on day one

Open the folder in Obsidian (File → Open Vault → this directory) and in a terminal alongside. Then:

cd ~/Obsidian/wiki
git init
echo ".obsidian/workspace*" > .gitignore
git add . && git commit -m "empty wiki"

Tip: Keep raw/ and wiki/ as siblings, not nested. The agent reads from one and writes to the other; the symmetry matters when you write rules about which it can touch.

2 Write the `CLAUDE.md` schema

This is the most important file in the system. It is what makes Claude a disciplined wiki maintainer instead of a generic chatbot. Save the following as CLAUDE.md at the root of the vault. It is opinionated and deliberately short — you will evolve it with the agent as you figure out what works for your domain.

# Wiki Schema

You maintain a personal wiki at `wiki/` compiled from immutable
sources at `raw/`. The human curates `raw/` and asks questions.
You write and maintain everything in `wiki/`.

## Folder layout

raw/          immutable source documents (articles, PDFs, transcripts)
raw/assets/   images downloaded by Obsidian Web Clipper
wiki/         you own this — every file here was written by you
wiki/index.md catalog of all wiki pages, organised by type
wiki/log.md   append-only chronological record of every operation

## Page types

Every page in wiki/ starts with YAML frontmatter:

---
title:
type:    source | entity | concept | comparison | synthesis
sources: [list of paths under raw/ this page draws from]
related: [list of wiki/ pages this connects to]
created: YYYY-MM-DD
updated: YYYY-MM-DD
---

- source     a summary of one document in raw/
- entity     a person, model, organisation, product, dataset
- concept    an idea, technique, term, ongoing debate
- comparison a side-by-side of two or more entities or concepts
- synthesis  an answer or analysis filed back from a query

Use [[wikilinks]] for cross-references so Obsidian's graph view renders.

## Operations

### /ingest <path-or-url>

1. Put the source under raw/ (fetch it if a URL was given).
2. Discuss the key takeaways with me in chat first.
3. Write a source-type page summarising the document.
4. Update wiki/index.md with the new entry.
5. For every entity, concept, or claim in the source that overlaps
   with an existing wiki page, update that page. Add new claims,
   flag contradictions, strengthen or revise the synthesis.
6. Create new entity or concept pages for important things mentioned
   but not yet covered. Cross-link.
7. A single ingest typically touches 10–15 wiki pages.
8. Append one line to wiki/log.md:
   ## [YYYY-MM-DD] ingest | <source title>

### /query <question>

1. Read wiki/index.md first to locate candidate pages.
2. Read the relevant pages.
3. Synthesise an answer with [[wikilinks]] back to the wiki.
4. Every factual claim must trace back to a source-type page or
   directly to a file in raw/. Never cite a synthesis page as if
   it were a source.
5. If the answer is non-trivial, file it as a synthesis page and
   update the index.
6. Append to wiki/log.md:
   ## [YYYY-MM-DD] query | <question>

### /lint

1. Contradictions between pages.
2. Stale claims newer sources have superseded.
3. Orphan pages with no inbound links.
4. Concepts mentioned across pages but lacking their own page.
5. Missing cross-references.
6. Data gaps worth filling with a web search — propose them.
7. Report findings. Do not auto-fix without my confirmation.
8. Append to wiki/log.md:
   ## [YYYY-MM-DD] lint | N issues

## Hard rules

- You never modify anything under raw/.
- You never invent a source. Unsourced claims must be marked [unsourced].
- A synthesis page is not a source. Never list a synthesis page in
  another page's `sources:` field.
- Prefer updating an existing page to creating a new one.
- Commit to git after every ingest, every filed query, and every lint.

That is the whole system. Karpathy’s gist is intentionally abstract about exactly which page types and which commands to use — the version above is one concrete instantiation that works. Edit it freely as you learn what your domain needs.

3 Set up Obsidian as the reading side

Open Obsidian on one half of the screen and Claude Code on the other. Claude edits the markdown; Obsidian renders it, shows the graph, follows the wikilinks. Two settings are worth turning on up front:

Install the Web Clipper browser extension. It converts any article to markdown and saves it into your vault in one click. Configure it to drop clipped articles into raw/.
In Settings → Files and links, set Attachment folder path to raw/assets/. Then in Settings → Hotkeys, search for “Download attachments for current file” and bind it to Ctrl+Shift+D. After clipping an article, hit the hotkey and every referenced image gets pulled to local disk so the LLM can read them later.

Open Obsidian’s graph view once you have a handful of pages — it is the fastest way to spot orphan pages and hub concepts. The graph is what makes a wiki feel like a wiki instead of a folder of notes.

4 Ingest your first source

From inside the vault, start Claude Code:

cd ~/Obsidian/wiki
claude

Claude reads CLAUDE.md automatically. Clip an article into raw/ with the Web Clipper, then in the Claude session type:

/ingest raw/articles/some-article-you-just-clipped.md

The agent will read it, discuss takeaways with you, write a source page, update index.md, create or update entity and concept pages, and append a line to log.md. Watch the Obsidian pane refresh as files land. Commit when you are happy with the result:

git add . && git commit -m "ingest: some-article"

Tip: Ingest one source at a time for the first ten or twenty. You are training the agent on your conventions — what level of detail you want, which entities matter, how aggressive to be with cross-links. Once it has the rhythm you can batch.

5 Query the wiki

The wiki only pays off when you start asking questions against it. Inside the same Claude session:

/query Where do my sources disagree about <X>,
        and what is the strongest single piece of evidence
        on either side?

The agent reads index.md first, pulls the relevant pages, and answers with [[wikilinks]] back to the wiki and citations down to raw/. If the answer is non-trivial — a comparison you will want again, a thread you will pull on later — the schema tells it to save the answer as a synthesis page. That is how your explorations compound instead of disappearing into chat history.

Karpathy on why this matters:

Karpathy: “Good answers can be filed back into the wiki as new pages. A comparison you asked for, an analysis, a connection you discovered — these are valuable and shouldn’t disappear into chat history.”

6 Lint every week or so

As the wiki grows past 30 or 40 pages, drift sets in: contradictions slip through, pages go orphaned, a concept ends up mentioned in five places but never has its own page. Schedule a lint pass roughly weekly:

/lint

The agent walks the wiki and reports problems — it does not auto-fix, because that is exactly how poisoning starts (see below). Read the report, pick what is worth fixing, and ask the agent to do those specific repairs. Commit after.

7 Add real search when the index breaks

Karpathy’s observation is that index.md alone scales surprisingly well — up to roughly 100 sources and a few hundred pages. Past that, the agent starts loading too much context per query and answers get sloppy. When you hit the wall, drop in qmd — a local hybrid BM25 + vector search engine for markdown, with both a CLI and an MCP server. The agent can shell out to it, or use it as a native tool. Until then, the index file is enough.

The two ways this goes wrong

Most write-ups about Karpathy’s pattern stop at the happy path. The community spent April and May 2026 finding the failure modes. Two are worth knowing before you put real reading hours into a wiki.

1. Wiki poisoning

The danger, named most clearly by Anand Lahoti in The Hidden Flaw in Karpathy’s LLM Wiki: an LLM writes a synthesis page, the synthesis page gets referenced by the next query, the agent reasons on top of its own prior summaries instead of the originals, and the chain of custody to the source quietly breaks. Every individual answer still looks fine. The drift only shows up when you go back to check.

The mitigation is the rule in the CLAUDE.md above: a synthesis page is not a source. Synthesis pages can related:-link to each other, but they can never appear in another page’s sources: field. Every sources: entry must resolve to something under raw/. Grep your wiki periodically to make sure the agent has not cheated:

grep -r "sources:" wiki/ | grep -v "raw/"

2. Cognitive outsourcing

The wiki looks organised. You never internalised any of it. This is the most common HN critique and it is fair. The wiki is a search index over your reading, not a replacement for actually having read the thing. The tell: you cannot summarise what you ingested last week without opening the wiki. If that happens for more than a couple of weeks running, slow down ingest, write synthesis pages yourself for the things you care most about, and use the agent strictly to maintain the bookkeeping around what you wrote.

One HN commenter put the smaller version of the same worry bluntly:

“It can’t even keep up with a simple claude.md let alone a whole wiki.”

That is overstated — the pattern clearly works at the scale Karpathy describes (~100 sources, ~400k words in his own wiki). But it is a useful sanity check: if the agent cannot reliably follow your schema on small examples, scaling up will not save you.

What you end up with

A folder of 50–200 interlinked markdown pages your agent maintains, versioned in git, rendered in Obsidian, grep-able from the terminal, model-agnostic, and free of vendor lock-in. The compounding effect is real: by the fortieth source the agent stops asking you basic questions because it has read the previous thirty-nine. And you have a concrete safeguard against the failure mode nobody warned you about.

How to Build a Karpathy-Style LLM Wiki?

Ingredients

1 Lay down the three folders

2 Write the CLAUDE.md schema

3 Set up Obsidian as the reading side

4 Ingest your first source

5 Query the wiki

6 Lint every week or so

7 Add real search when the index breaks

The two ways this goes wrong

1. Wiki poisoning

2. Cognitive outsourcing

What you end up with

Further reading

2 Write the `CLAUDE.md` schema