How to Self-Host a Company RAG with LibreChat

LibreChat ships SAML, OIDC, LDAP and RBAC for free, and its Entra integration even syncs your Microsoft groups on login. This recipe stands it up on a VM you own — staff sign in with their work account, upload Office/PDF/Markdown, and use shared corporate assistants alongside their own personal ones.

Prep time Half a day for a working pilot
One-time cost €0 software — LibreChat is MIT open source
Going cost One VM (~€40–120/mo) + your LLM tokens or GPU power

Ingredients

Why LibreChat for this? It has the best-documented Entra integration of the self-hostable tools — including Entra group sync via Microsoft Graph — plus the strongest permission model: every agent has its own access-control list and can be shared with specific users, groups, roles or made public. (LibreChat was acquired by ClickHouse in late 2025; the open-source project remains free to self-host.)

1 Provision the VM and install Docker

curl -fsSL https://get.docker.com | sh
sudo usermod -aG docker "$USER"   # log out/in afterwards

docker --version
docker compose version

Point your DNS A record at the box and open only ports 80 and 443. Every backend service stays on the internal Docker network.

2 Get LibreChat

LibreChat ships a complete Compose stack — app, MongoDB, Meilisearch, a vectordb (pgvector) and the rag_api service are all wired up already. Clone it and create your config files.

git clone https://github.com/danny-avila/LibreChat.git
cd LibreChat

cp .env.example .env
cp librechat.example.yaml librechat.yaml

.env holds secrets and toggles; librechat.yaml configures endpoints, agents and file limits. We will touch both.

3 Register an app in Microsoft Entra

In Azure portal → Microsoft Entra ID → App registrations → New registration:

  1. Name it LibreChat; single-tenant is fine.
  2. Redirect URI — platform Web: https://chat.example.com/oauth/openid/callback
  3. Copy the client ID and tenant ID; create a client secret and copy its value.
  4. Add Microsoft Graph delegated permissions openid, email, profile, User.Read.
  5. (For group sync) Under Token configuration → Add groups claim, include Security groups. For richer group lookups, also grant GroupMember.Read.All — LibreChat can resolve Entra groups via Graph when token reuse is on.

Your OIDC issuer for a single tenant is: https://login.microsoftonline.com/<tenant-id>/v2.0.

4 Configure Microsoft login (and group sync)

Add these to .env. The first block points LibreChat at your real domain; the second is the Entra OIDC wiring.

# .env

# ── Public URLs ─────────────────────────────────────────
DOMAIN_CLIENT=https://chat.example.com
DOMAIN_SERVER=https://chat.example.com

# ── Login policy ────────────────────────────────────────
ALLOW_EMAIL_LOGIN=false          # force everyone through Microsoft
ALLOW_SOCIAL_LOGIN=true
ALLOW_SOCIAL_REGISTRATION=true
ALLOW_REGISTRATION=false

# ── Microsoft Entra (Azure AD) OIDC ─────────────────────
OPENID_ISSUER=https://login.microsoftonline.com/<tenant-id>/v2.0
OPENID_CLIENT_ID=<application-client-id>
OPENID_CLIENT_SECRET=<client-secret-value>
OPENID_SCOPE=openid profile email
OPENID_CALLBACK_URL=/oauth/openid/callback
OPENID_SESSION_SECRET=change-me-to-a-long-random-string
OPENID_BUTTON_LABEL=Sign in with Microsoft

# Reuse the ID/access token so LibreChat can read Entra groups
OPENID_REUSE_TOKENS=true
Group sync. With token reuse on, the Entra security groups in the user’s token are mapped into LibreChat on each login, and membership changes propagate the next time they sign in. Those groups become the audience you share corporate agents with in Step 7.

5 Turn on RAG and pick an embedding model

File chat and agent knowledge run through the bundled rag_api service, which parses documents (PDF, DOCX, PPTX, XLSX, Markdown, TXT and more) and stores vectors in pgvector. You only need to tell it which embedding model to use. Still in .env:

# ── RAG API ─────────────────────────────────────────────
RAG_API_URL=http://rag_api:8000

# Option A — local embeddings via Ollama
EMBEDDINGS_PROVIDER=ollama
OLLAMA_BASE_URL=http://host.docker.internal:11434
EMBEDDINGS_MODEL=nomic-embed-text

# Option B — Azure / OpenAI embeddings (comment out A)
# EMBEDDINGS_PROVIDER=openai
# RAG_OPENAI_API_KEY=sk-...
# EMBEDDINGS_MODEL=text-embedding-3-large

Add your chat model the same way — e.g. OPENAI_API_KEY for OpenAI/Azure, or an Ollama endpoint declared under custom endpoints in librechat.yaml. With at least one chat model and one embedding model set, RAG is fully functional.

6 Put it behind TLS and launch

Front the stack with Caddy for automatic HTTPS. Create a Caddyfile next to the repo:

# Caddyfile
chat.example.com {
    reverse_proxy api:3080
}

Add a caddy service to docker-compose.override.yml (the supported way to extend LibreChat’s stack without editing the base file), then start everything:

docker compose up -d

# pull local models if using Ollama
ollama pull qwen3
ollama pull nomic-embed-text

docker compose logs -f api

Browse to https://chat.example.com — you should see a Sign in with Microsoft button.

7 Set roles and registration policy

With ALLOW_REGISTRATION=false and email login off, only people in your tenant who pass through Microsoft can get in. The first user is yours to promote to admin:

# make a user an admin (run from the repo)
npm run user-stats          # or list users
docker compose exec api npm run set-role <email> ADMIN

Admins manage the access control system — roles and groups that decide who can see which agent. Synced Entra groups show up here as shareable audiences.

8 Build the knowledge: agents

In LibreChat a “RAG database” is expressed as an agent with attached knowledge. This is how you get both corporate and personal RAGs — the per-agent ACL does the separation.

Corporate agents (shared, curated)

  1. Open the Agents builder, create e.g. HR Assistant.
  2. Under Upload for File Search, add the source files — .docx, .xlsx, .pptx, .pdf, .md. The RAG API parses, chunks and embeds them into the agent’s knowledge.
  3. Write a system prompt (“Answer only from the HR handbook and cite the section”).
  4. Set the agent’s sharing/ACL to a group (e.g. everyone or finance) with view access. Members pick it from the agent list; the owner controls its content.

Personal agents (user sandboxes)

Any user can build their own agent, upload their own files for File Search, and keep it private — sitting right beside the shared corporate ones. That is exactly the “users try their own small RAGs” pattern, with no extra setup; the ACL isolates them by default.

9 Query it

Users pick an agent from the dropdown and ask away — retrieval runs over that agent’s knowledge and answers come back with source citations. For one-off questions, a user can also attach a file directly to a message and chat with it without building an agent at all.

One agent, several sources. A single agent can hold multiple uploaded document sets, so an “IT & HR Helpdesk” agent can answer across both policy libraries at once.

Troubleshooting

Microsoft button missing or callback fails

The Entra redirect URI must be exactly https://chat.example.com/oauth/openid/callback, and DOMAIN_SERVER / DOMAIN_CLIENT must be your real HTTPS domain. If the button does not appear, confirm ALLOW_SOCIAL_LOGIN=true and that all four OPENID_* values are set.

Groups are not syncing

File upload succeeds but answers ignore it

The rag_api or vectordb container is probably unhealthy, or no embedding model is set. Check docker compose logs rag_api, confirm RAG_API_URL and the EMBEDDINGS_* values, and verify the embedding model actually pulled if you use Ollama.

Ollama unreachable from containers

Inside Docker, localhost is the container, not the host. Use http://host.docker.internal:11434 (Linux may need extra_hosts: ["host.docker.internal:host-gateway"]), or run Ollama as a service on the same Compose network.

What you end up with

A private RAG at https://chat.example.com where staff log in with their Microsoft account, their Entra groups flow straight into the permission model, and they use group-scoped corporate assistants alongside their own private ones — all on a single VM you control, with documents and vectors that never leave your infrastructure.

References