How to Self-Host RAGFlow for Deep-Document RAG
When your corporate knowledge lives in gnarly PDFs, scanned contracts and dense spreadsheets, retrieval quality is everything — and that is RAGFlow’s whole point. This recipe stands it up on a VM you own, with Microsoft Entra login, Office/PDF/Markdown ingestion, and multiple per-team and personal knowledge bases.
Ingredients
- A Linux VM — Ubuntu 24.04. RAGFlow is hungrier than a chat UI: budget 4+ vCPU and 16 GB RAM minimum, 32 GB recommended (it runs Elasticsearch, MySQL, MinIO and Redis alongside the app), plus 100 GB+ disk.
- Docker ≥ 24 and the Docker Compose v2 plugin.
- The kernel setting
vm.max_map_count ≥ 262144(Elasticsearch refuses to start otherwise). - A DNS name and TLS (Caddy gives you one-line HTTPS).
- Microsoft Entra ID admin rights to register an app for OIDC login.
- An LLM + embedding backend — local Ollama, or any OpenAI-compatible / Azure OpenAI endpoint. RAGFlow’s full image also ships embedding models so you can start with nothing external.
1 Prepare the VM
Install Docker, then raise the memory-map limit Elasticsearch needs and make it survive reboots.
curl -fsSL https://get.docker.com | sh
sudo usermod -aG docker "$USER" # log out/in afterwards
# Elasticsearch requirement — apply now and persist
sudo sysctl -w vm.max_map_count=262144
echo 'vm.max_map_count=262144' | sudo tee /etc/sysctl.d/99-ragflow.conf
Open only ports 80 and 443 on the firewall. Every backend service stays on the internal Docker network.
2 Get RAGFlow and choose an image
RAGFlow ships its own Compose stack. Clone the repo and drop into the
docker folder.
git clone https://github.com/infiniflow/ragflow.git
cd ragflow/docker
Open the .env file and pick your image. Two flavours:
Full image
- Tag
:v0.21.1(no-slim)- Size
- ~9 GB
- Includes
- Built-in embedding models
- Use when
- You want it to just work offline
Slim image
- Tag
:v0.21.1-slim- Size
- ~2 GB
- Includes
- No embedding models
- Use when
- You supply embeddings via API/Ollama
RAGFLOW_IMAGE in .env accordingly. The
default doc engine is Elasticsearch; you can switch to the lighter
infinity via DOC_ENGINE if RAM is tight.
3 Register an app in Microsoft Entra
In Azure portal → Microsoft Entra ID → App registrations → New registration:
- Name it
RAGFlow; single-tenant is fine. - Redirect URI — platform Web:
https://rag.example.com/v1/user/oauth/callback/microsoft(the trailingmicrosoftis the channel key you will use in config). - Copy the client ID and tenant ID; create a client secret and copy its value.
- Add Microsoft Graph delegated permissions
openid,email,profile,User.Read.
Your OIDC issuer for a single tenant is:
https://login.microsoftonline.com/<tenant-id>/v2.0.
4 Wire up OIDC login
RAGFlow reads an oauth block from its service config.
Edit ragflow/docker/service_conf.yaml.template (it is
rendered into the running config on startup) and add a Microsoft
channel:
# service_conf.yaml.template
oauth:
microsoft:
type: oidc
display_name: "Microsoft"
client_id: "<application-client-id>"
client_secret: "<client-secret-value>"
issuer: "https://login.microsoftonline.com/<tenant-id>/v2.0"
scope: "openid email profile"
redirect_uri: "https://rag.example.com/v1/user/oauth/callback/microsoft"
service_conf.yaml.template shipped in your
checked-out version — that file is the source of truth.
5 Put it behind TLS and start
RAGFlow’s own nginx listens on port 80 inside
the stack. Front it with Caddy for automatic HTTPS. Add a
caddy service to docker-compose.yml (or run
Caddy separately) pointing at the RAGFlow web container:
# Caddyfile
rag.example.com {
reverse_proxy ragflow-server:80
}
Bring the whole stack up. First boot pulls several gigabytes and initialises Elasticsearch, so give it a few minutes.
docker compose -f docker-compose.yml up -d
# follow startup — wait for the RAGFlow banner
docker compose logs -f ragflow-server
When you see the ASCII banner and
* Running on all addresses, browse to
https://rag.example.com. You should see a
Sign in with Microsoft button alongside the form.
6 Register the models
Log in (the first user can be made the owner), then open Avatar → Model providers. Add:
- A chat model — e.g. an Ollama endpoint
(
http://ollama:11434) or your Azure OpenAI / OpenAI key. - An embedding model — the built-in one (full image), an Ollama embedding model, or an API embedding model.
Then set them as the system default models so every new knowledge base inherits them.
7 Create knowledge bases
In RAGFlow a “database” is a Knowledge base (dataset). This is where the multi-RAG requirement and the deep parsing both pay off.
- Click Create knowledge base, name it e.g. Contracts 2026.
- Pick a chunking method / template that matches the
content —
General,Paper,Manual,Laws,Q&A,Table,Presentation, and more. The right template is the single biggest lever on answer quality. - Upload
.pdf,.docx,.xlsx,.pptx,.mdand friends, then click Parse. DeepDoc shows you the detected layout and the resulting chunks — you can inspect and correct them before they are embedded.
Corporate vs personal — the multi-tenant model
RAGFlow is multi-tenant. Each user owns their knowledge bases, and you share the corporate ones with a team:
- Create a team and invite the relevant staff.
- Share curated knowledge bases (e.g. HR Handbook, Contracts 2026) with that team — members query but the owner controls content.
- Individual users freely create their own private knowledge bases for experiments, sitting right next to the shared corporate ones. That is exactly the “users try their own small RAGs” pattern.
8 Build a chat assistant
Knowledge bases are the data; a Chat assistant is how people use them. Under Chat → Create an assistant:
- Attach one or more knowledge bases (a single assistant can span several, e.g. HR + IT policies).
- Set the system prompt and enable citations so every answer links back to the source chunk — with DeepDoc that citation even highlights the region of the original PDF.
- Tune
similarity thresholdandtop Nif answers are too narrow or too noisy.
Power users can instead use the visual Agent canvas to chain retrieval across knowledge bases — but a plain chat assistant covers the common company case.
Troubleshooting
Elasticsearch container exits immediately
Almost always vm.max_map_count is too low or the VM is
out of RAM. Re-check sysctl vm.max_map_count (must be
≥ 262144) and give the box more memory — ES alone wants a
couple of gigabytes. Switching DOC_ENGINE=infinity
lowers the footprint.
Microsoft button missing or callback fails
- Confirm the
oauthblock actually loaded (docker compose logs ragflow-serveron startup). - The Entra redirect URI must match
.../v1/user/oauth/callback/microsoftcharacter for character, including the channel key. - Make sure RAGFlow knows its public URL so it builds the right redirect — access it via the HTTPS domain, not the raw IP.
Parsing is slow or stuck
DeepDoc’s layout analysis is CPU-heavy. Large scanned PDFs can
take minutes each; watch the task executor logs. For big bulk
imports, give the VM more cores or scale the
ragflow task workers.
Answers ignore an uploaded file
A file only becomes searchable after it shows parsed/success and has been embedded. If it is stuck at pending, the embedding model is likely unset or unreachable — re-check Model providers and the system default embedding model.
What you end up with
A self-hosted RAG at https://rag.example.com that
actually understands your messy documents — tables stay
tables, contracts keep their clauses — with staff signing in
via Microsoft, querying team-shared corporate knowledge bases,
and spinning up their own private ones beside them. Every answer
is grounded and cited, and nothing leaves the VM you control.
References
- infiniflow/ragflow on GitHub — source, Compose stack and release tags
- RAGFlow documentation — deployment, configuration and OAuth/OIDC
- RAGFlow — Configuration reference (
.env,service_conf.yaml, doc engine) - RAGFlow — User & tenant management (teams and sharing)
- Microsoft — Register an application in Entra ID