How to Self-Host a Company RAG with Open WebUI
Stand up a private, self-hosted RAG on a VM you control — staff sign in with their Microsoft work account, upload Word, Excel, PowerPoint, PDF and Markdown, and query both shared corporate knowledge bases and their own personal ones. No SaaS, no per-token bill, no data leaving your tenant.
Ingredients
- A Linux VM — Ubuntu 24.04, at least 4 vCPU / 16 GB RAM / 100 GB disk. Add a GPU only if you intend to run the LLM and embeddings locally.
- Docker Engine and the Docker Compose plugin.
- A DNS name for the box (e.g.
chat.example.com) and a way to get TLS — we use Caddy for one-line HTTPS. - Microsoft Entra ID (Azure AD) admin rights, enough to register an application.
- An LLM backend. Two clean options:
- Fully local — Ollama on the same box (private, free, needs RAM/GPU), or
- Azure OpenAI / any OpenAI-compatible endpoint (keeps data in your Azure tenant, no GPU to babysit).
- Apache Tika (or Docling) for turning Office files into clean text — we ship it as a container.
By the end you will have https://chat.example.com behind
Microsoft login, with a “Workspace → Knowledge” area
where admins publish corporate knowledge bases and users spin up
their own private ones.
1 Provision the VM and install Docker
Spin up the VM with your cloud or hypervisor of choice, point your DNS A record at it, then install Docker from the official convenience script.
curl -fsSL https://get.docker.com | sh
sudo usermod -aG docker "$USER" # log out/in afterwards
# verify
docker --version
docker compose version
Open the firewall for ports 80 and 443 only. Everything else stays on the Docker network — the app itself, Ollama, Tika and the database should never be exposed directly.
2 Pick your LLM and embedding backend
RAG needs two models: a chat model to write answers and an embedding model to turn documents into vectors. Decide now — it changes a few env vars later.
Ollama on the box
- Chat
- e.g.
qwen3orllama3.3 - Embeddings
nomic-embed-textorbge-m3- Pro
- Nothing leaves the VM at all
- Con
- Wants a GPU for usable speed
Azure OpenAI
- Chat
- your deployed GPT model
- Embeddings
text-embedding-3-large- Pro
- No GPU; data stays in your Azure region
- Con
- Per-token cost; needs an Azure subscription
The rest of this recipe shows the local Ollama path and notes the Azure swap where it matters.
3 Register an app in Microsoft Entra
This is what makes “Sign in with Microsoft” work. In the Azure portal → Microsoft Entra ID → App registrations → New registration:
- Name it something like
Open WebUI. - Supported account types: Accounts in this organizational directory only (single tenant).
- Redirect URI — platform Web:
https://chat.example.com/oauth/microsoft/callback - Create it, then copy the Application (client) ID and Directory (tenant) ID.
- Under Certificates & secrets, create a client secret and copy its value now (you cannot see it again).
- Under API permissions, add Microsoft Graph delegated permissions:
openid,email,profile,User.Read. - (Optional, for group-based access) Under Token configuration → Add groups claim, include Security groups in the ID token. This lets Open WebUI map Entra groups to its own groups.
4 Write the Compose stack
Create a project directory and a docker-compose.yml. This
wires together four services: Open WebUI, Ollama (LLM + embeddings),
Apache Tika (document extraction), and Caddy (TLS).
mkdir -p ~/openwebui && cd ~/openwebui
# docker-compose.yml
services:
open-webui:
image: ghcr.io/open-webui/open-webui:main
restart: unless-stopped
env_file: .env
depends_on: [ollama, tika]
volumes:
- openwebui-data:/app/backend/data
# no host port — Caddy talks to it over the internal network
ollama:
image: ollama/ollama:latest
restart: unless-stopped
volumes:
- ollama-data:/root/.ollama
# uncomment to use an NVIDIA GPU:
# deploy:
# resources:
# reservations:
# devices: [{ driver: nvidia, count: all, capabilities: [gpu] }]
tika:
image: apache/tika:latest-full
restart: unless-stopped
caddy:
image: caddy:2
restart: unless-stopped
ports: ["80:80", "443:443"]
volumes:
- ./Caddyfile:/etc/caddy/Caddyfile
- caddy-data:/data
- caddy-config:/config
volumes:
openwebui-data:
ollama-data:
caddy-data:
caddy-config:
And a one-line Caddyfile — Caddy fetches and
renews a Let’s Encrypt certificate automatically:
# Caddyfile
chat.example.com {
reverse_proxy open-webui:8080
}
5 Configure the environment
This .env file is where the magic happens: Microsoft
login, Office extraction via Tika, local embeddings, and hybrid
search all switched on. Replace the three Entra values and the
secret key.
# .env
# ── Base ────────────────────────────────────────────────
WEBUI_URL=https://chat.example.com
WEBUI_SECRET_KEY=change-me-to-a-long-random-string
# ── Microsoft Entra (Azure AD) login ────────────────────
ENABLE_OAUTH_SIGNUP=true
ENABLE_LOGIN_FORM=false # force everyone through Microsoft
OAUTH_PROVIDER_NAME=Microsoft
MICROSOFT_CLIENT_ID=<application-client-id>
MICROSOFT_CLIENT_SECRET=<client-secret-value>
MICROSOFT_CLIENT_TENANT_ID=<directory-tenant-id>
# Map Entra security groups -> Open WebUI groups (optional)
ENABLE_OAUTH_GROUP_MANAGEMENT=true
OAUTH_GROUP_CLAIM=groups
# ── Document extraction (Office, PDF, etc.) ─────────────
CONTENT_EXTRACTION_ENGINE=tika
TIKA_SERVER_URL=http://tika:9998
# ── Embeddings (local via Ollama) ───────────────────────
RAG_EMBEDDING_ENGINE=ollama
RAG_OLLAMA_BASE_URL=http://ollama:11434
RAG_EMBEDDING_MODEL=nomic-embed-text
# ── Retrieval quality ───────────────────────────────────
ENABLE_RAG_HYBRID_SEARCH=true
RAG_RERANKING_MODEL=BAAI/bge-reranker-v2-m3
RAG_TOP_K=5
# ── Chat model source (local Ollama) ────────────────────
OLLAMA_BASE_URL=http://ollama:11434
RAG_EMBEDDING_ENGINE=ollama block and set
RAG_EMBEDDING_ENGINE=openai with
RAG_OPENAI_API_BASE_URL,
RAG_OPENAI_API_KEY and
RAG_EMBEDDING_MODEL=text-embedding-3-large. Add your
chat deployment as an OpenAI connection in the admin UI later.
6 Launch and pull the models
docker compose up -d
# pull the local models into Ollama (skip if using Azure)
docker compose exec ollama ollama pull qwen3
docker compose exec ollama ollama pull nomic-embed-text
# watch it come up
docker compose logs -f open-webui
Browse to https://chat.example.com. You should be
bounced straight to the Microsoft sign-in page. The
first account to log in becomes the
admin — make sure that is you.
7 Set roles, groups and signup policy
By default new Microsoft sign-ins land in a pending state so strangers cannot self-enroll. As admin, open Admin Panel → Users and:
- Set the default role for new users to
user(orpendingif you want to approve each one). - Create Groups like
finance,legal,everyone. If you enabled the Entra groups claim, members are mapped automatically on login; otherwise assign them by hand.
Groups are the hinge for the next step — they decide who sees which corporate knowledge base.
8 Create the knowledge bases
This is your “more than one database” requirement, and Open WebUI handles it natively. Go to Workspace → Knowledge.
Corporate knowledge bases (curated, shared)
- Click + Create Knowledge, name it e.g. HR Handbook.
- Set Access to a group (e.g.
everyone) with read permission so staff can query but not edit it. - Upload the source files —
.docx,.xlsx,.pptx,.pdf,.md. Tika extracts the text, the embedding model vectorises it, and each file is chunked automatically.
Personal knowledge bases (user sandboxes)
Any user with the user role can create their own
Knowledge collection from the same screen. It is private to
them by default — perfect for someone testing a small
RAG over their own project files without touching the corporate set.
Nothing extra to configure; the permission model already isolates
them.
USER_PERMISSIONS_WORKSPACE_KNOWLEDGE_ACCESS=false in
.env and grant it per-group instead.
9 Query it — two ways
Ad-hoc: in any chat, type # and pick a
knowledge base (or a single file) to ground that conversation.
Permanent: build a reusable assistant under Workspace → Models — choose a base chat model, attach one or more knowledge bases, give it a system prompt (“You are the HR assistant; answer only from the handbook and cite the section”), and share it with a group. Staff then pick HR Assistant from the model dropdown and just ask questions.
Either way, retrieval runs hybrid search across the selected base, reranks the hits, and the answer comes back with inline citations users can click to see the source chunk.
Troubleshooting
Microsoft login loops or returns redirect_uri mismatch
The redirect URI in Entra must be exactly
https://chat.example.com/oauth/microsoft/callback —
same scheme, host and path, no trailing slash. Confirm
WEBUI_URL matches your real HTTPS domain.
Office files import as gibberish or empty
That means extraction fell back to the naive parser. Check Tika is
healthy (docker compose logs tika) and that
CONTENT_EXTRACTION_ENGINE=tika with the right
TIKA_SERVER_URL. Use the latest-full Tika
image — the slim one lacks the Office parsers.
Retrieval returns nothing relevant
- Make sure the embedding model actually pulled
(
ollama list) — if embeddings fail silently, the vector store is empty. - Re-index after changing the embedding model: the old vectors are incompatible with the new model.
- Raise
RAG_TOP_Kand confirmENABLE_RAG_HYBRID_SEARCH=true.
Groups from Entra are not mapping
The ID token only carries the groups claim if you added it under Token configuration, and large directories emit a link instead of the list (the “groups overage” problem). For big tenants, assign groups inside Open WebUI manually, or front it with an OIDC proxy that resolves groups via Graph.
What you end up with
A private RAG at https://chat.example.com where the
whole company logs in with their Microsoft account, queries
curated corporate knowledge bases scoped by Entra group, and
experiments in their own private collections — all running
on a single VM you own, with documents and vectors that never
leave your infrastructure.
References
- Open WebUI — Knowledge (knowledge bases)
- Open WebUI — Retrieval Augmented Generation
- Open WebUI — Environment variable reference (OAuth, RAG, content extraction)
- Microsoft — Register an application in Entra ID
- Apache Tika — the Office/PDF text extractor