How to Self-Host a Company RAG with Open WebUI

Stand up a private, self-hosted RAG on a VM you control — staff sign in with their Microsoft work account, upload Word, Excel, PowerPoint, PDF and Markdown, and query both shared corporate knowledge bases and their own personal ones. No SaaS, no per-token bill, no data leaving your tenant.

Prep time Half a day for a working pilot
One-time cost €0 software — Open WebUI is MIT-ish open source
Going cost One VM (~€30–120/mo) + your LLM tokens or GPU power

Ingredients

By the end you will have https://chat.example.com behind Microsoft login, with a “Workspace → Knowledge” area where admins publish corporate knowledge bases and users spin up their own private ones.

Why Open WebUI for this? It is the best all-rounder for your four requirements: native OIDC that talks to Entra, first-class “Knowledge” collections with group-based access control, hybrid retrieval (BM25 + vector + reranking) built in, and a polished chat UI non-technical staff actually enjoy.

1 Provision the VM and install Docker

Spin up the VM with your cloud or hypervisor of choice, point your DNS A record at it, then install Docker from the official convenience script.

curl -fsSL https://get.docker.com | sh
sudo usermod -aG docker "$USER"   # log out/in afterwards

# verify
docker --version
docker compose version

Open the firewall for ports 80 and 443 only. Everything else stays on the Docker network — the app itself, Ollama, Tika and the database should never be exposed directly.

2 Pick your LLM and embedding backend

RAG needs two models: a chat model to write answers and an embedding model to turn documents into vectors. Decide now — it changes a few env vars later.

Fully local

Ollama on the box

Chat
e.g. qwen3 or llama3.3
Embeddings
nomic-embed-text or bge-m3
Pro
Nothing leaves the VM at all
Con
Wants a GPU for usable speed
Tenant-hosted API

Azure OpenAI

Chat
your deployed GPT model
Embeddings
text-embedding-3-large
Pro
No GPU; data stays in your Azure region
Con
Per-token cost; needs an Azure subscription

The rest of this recipe shows the local Ollama path and notes the Azure swap where it matters.

3 Register an app in Microsoft Entra

This is what makes “Sign in with Microsoft” work. In the Azure portal → Microsoft Entra ID → App registrations → New registration:

  1. Name it something like Open WebUI.
  2. Supported account types: Accounts in this organizational directory only (single tenant).
  3. Redirect URI — platform Web: https://chat.example.com/oauth/microsoft/callback
  4. Create it, then copy the Application (client) ID and Directory (tenant) ID.
  5. Under Certificates & secrets, create a client secret and copy its value now (you cannot see it again).
  6. Under API permissions, add Microsoft Graph delegated permissions: openid, email, profile, User.Read.
  7. (Optional, for group-based access) Under Token configuration → Add groups claim, include Security groups in the ID token. This lets Open WebUI map Entra groups to its own groups.
Keep three secrets handy: the tenant ID, the client ID, and the client secret. They go straight into the environment file in the next step.

4 Write the Compose stack

Create a project directory and a docker-compose.yml. This wires together four services: Open WebUI, Ollama (LLM + embeddings), Apache Tika (document extraction), and Caddy (TLS).

mkdir -p ~/openwebui && cd ~/openwebui
# docker-compose.yml
services:
  open-webui:
    image: ghcr.io/open-webui/open-webui:main
    restart: unless-stopped
    env_file: .env
    depends_on: [ollama, tika]
    volumes:
      - openwebui-data:/app/backend/data
    # no host port — Caddy talks to it over the internal network

  ollama:
    image: ollama/ollama:latest
    restart: unless-stopped
    volumes:
      - ollama-data:/root/.ollama
    # uncomment to use an NVIDIA GPU:
    # deploy:
    #   resources:
    #     reservations:
    #       devices: [{ driver: nvidia, count: all, capabilities: [gpu] }]

  tika:
    image: apache/tika:latest-full
    restart: unless-stopped

  caddy:
    image: caddy:2
    restart: unless-stopped
    ports: ["80:80", "443:443"]
    volumes:
      - ./Caddyfile:/etc/caddy/Caddyfile
      - caddy-data:/data
      - caddy-config:/config

volumes:
  openwebui-data:
  ollama-data:
  caddy-data:
  caddy-config:

And a one-line Caddyfile — Caddy fetches and renews a Let’s Encrypt certificate automatically:

# Caddyfile
chat.example.com {
    reverse_proxy open-webui:8080
}

5 Configure the environment

This .env file is where the magic happens: Microsoft login, Office extraction via Tika, local embeddings, and hybrid search all switched on. Replace the three Entra values and the secret key.

# .env

# ── Base ────────────────────────────────────────────────
WEBUI_URL=https://chat.example.com
WEBUI_SECRET_KEY=change-me-to-a-long-random-string

# ── Microsoft Entra (Azure AD) login ────────────────────
ENABLE_OAUTH_SIGNUP=true
ENABLE_LOGIN_FORM=false                 # force everyone through Microsoft
OAUTH_PROVIDER_NAME=Microsoft
MICROSOFT_CLIENT_ID=<application-client-id>
MICROSOFT_CLIENT_SECRET=<client-secret-value>
MICROSOFT_CLIENT_TENANT_ID=<directory-tenant-id>

# Map Entra security groups -> Open WebUI groups (optional)
ENABLE_OAUTH_GROUP_MANAGEMENT=true
OAUTH_GROUP_CLAIM=groups

# ── Document extraction (Office, PDF, etc.) ─────────────
CONTENT_EXTRACTION_ENGINE=tika
TIKA_SERVER_URL=http://tika:9998

# ── Embeddings (local via Ollama) ───────────────────────
RAG_EMBEDDING_ENGINE=ollama
RAG_OLLAMA_BASE_URL=http://ollama:11434
RAG_EMBEDDING_MODEL=nomic-embed-text

# ── Retrieval quality ───────────────────────────────────
ENABLE_RAG_HYBRID_SEARCH=true
RAG_RERANKING_MODEL=BAAI/bge-reranker-v2-m3
RAG_TOP_K=5

# ── Chat model source (local Ollama) ────────────────────
OLLAMA_BASE_URL=http://ollama:11434
Azure OpenAI instead? Drop the RAG_EMBEDDING_ENGINE=ollama block and set RAG_EMBEDDING_ENGINE=openai with RAG_OPENAI_API_BASE_URL, RAG_OPENAI_API_KEY and RAG_EMBEDDING_MODEL=text-embedding-3-large. Add your chat deployment as an OpenAI connection in the admin UI later.

6 Launch and pull the models

docker compose up -d

# pull the local models into Ollama (skip if using Azure)
docker compose exec ollama ollama pull qwen3
docker compose exec ollama ollama pull nomic-embed-text

# watch it come up
docker compose logs -f open-webui

Browse to https://chat.example.com. You should be bounced straight to the Microsoft sign-in page. The first account to log in becomes the admin — make sure that is you.

7 Set roles, groups and signup policy

By default new Microsoft sign-ins land in a pending state so strangers cannot self-enroll. As admin, open Admin Panel → Users and:

Groups are the hinge for the next step — they decide who sees which corporate knowledge base.

8 Create the knowledge bases

This is your “more than one database” requirement, and Open WebUI handles it natively. Go to Workspace → Knowledge.

Corporate knowledge bases (curated, shared)

  1. Click + Create Knowledge, name it e.g. HR Handbook.
  2. Set Access to a group (e.g. everyone) with read permission so staff can query but not edit it.
  3. Upload the source files — .docx, .xlsx, .pptx, .pdf, .md. Tika extracts the text, the embedding model vectorises it, and each file is chunked automatically.

Personal knowledge bases (user sandboxes)

Any user with the user role can create their own Knowledge collection from the same screen. It is private to them by default — perfect for someone testing a small RAG over their own project files without touching the corporate set. Nothing extra to configure; the permission model already isolates them.

Permission knob: if you do not want every user creating knowledge bases, set USER_PERMISSIONS_WORKSPACE_KNOWLEDGE_ACCESS=false in .env and grant it per-group instead.

9 Query it — two ways

Ad-hoc: in any chat, type # and pick a knowledge base (or a single file) to ground that conversation.

Permanent: build a reusable assistant under Workspace → Models — choose a base chat model, attach one or more knowledge bases, give it a system prompt (“You are the HR assistant; answer only from the handbook and cite the section”), and share it with a group. Staff then pick HR Assistant from the model dropdown and just ask questions.

Either way, retrieval runs hybrid search across the selected base, reranks the hits, and the answer comes back with inline citations users can click to see the source chunk.

Troubleshooting

Microsoft login loops or returns redirect_uri mismatch

The redirect URI in Entra must be exactly https://chat.example.com/oauth/microsoft/callback — same scheme, host and path, no trailing slash. Confirm WEBUI_URL matches your real HTTPS domain.

Office files import as gibberish or empty

That means extraction fell back to the naive parser. Check Tika is healthy (docker compose logs tika) and that CONTENT_EXTRACTION_ENGINE=tika with the right TIKA_SERVER_URL. Use the latest-full Tika image — the slim one lacks the Office parsers.

Retrieval returns nothing relevant

Groups from Entra are not mapping

The ID token only carries the groups claim if you added it under Token configuration, and large directories emit a link instead of the list (the “groups overage” problem). For big tenants, assign groups inside Open WebUI manually, or front it with an OIDC proxy that resolves groups via Graph.

What you end up with

A private RAG at https://chat.example.com where the whole company logs in with their Microsoft account, queries curated corporate knowledge bases scoped by Entra group, and experiments in their own private collections — all running on a single VM you own, with documents and vectors that never leave your infrastructure.

References