How to Self-Host a Company RAG with LibreChat
LibreChat ships SAML, OIDC, LDAP and RBAC for free, and its Entra integration even syncs your Microsoft groups on login. This recipe stands it up on a VM you own — staff sign in with their work account, upload Office/PDF/Markdown, and use shared corporate assistants alongside their own personal ones.
Ingredients
- A Linux VM — Ubuntu 24.04. The stack runs several services (app, MongoDB, Meilisearch, a vector DB and the RAG API), so budget 4 vCPU / 8–16 GB RAM / 100 GB disk. Add a GPU only if you run the LLM/embeddings locally.
- Docker Engine and the Docker Compose plugin.
- A DNS name (e.g.
chat.example.com) and TLS — we use Caddy for one-line HTTPS. - Microsoft Entra ID admin rights to register an app.
- An LLM + embedding backend — local Ollama, or any OpenAI-compatible / Azure OpenAI endpoint.
1 Provision the VM and install Docker
curl -fsSL https://get.docker.com | sh
sudo usermod -aG docker "$USER" # log out/in afterwards
docker --version
docker compose version
Point your DNS A record at the box and open only ports 80 and 443. Every backend service stays on the internal Docker network.
2 Get LibreChat
LibreChat ships a complete Compose stack — app, MongoDB,
Meilisearch, a vectordb (pgvector) and the
rag_api service are all wired up already. Clone it and
create your config files.
git clone https://github.com/danny-avila/LibreChat.git
cd LibreChat
cp .env.example .env
cp librechat.example.yaml librechat.yaml
.env holds secrets and toggles; librechat.yaml
configures endpoints, agents and file limits. We will touch both.
3 Register an app in Microsoft Entra
In Azure portal → Microsoft Entra ID → App registrations → New registration:
- Name it
LibreChat; single-tenant is fine. - Redirect URI — platform Web:
https://chat.example.com/oauth/openid/callback - Copy the client ID and tenant ID; create a client secret and copy its value.
- Add Microsoft Graph delegated permissions
openid,email,profile,User.Read. - (For group sync) Under Token configuration → Add groups claim, include Security groups. For richer group lookups, also grant
GroupMember.Read.All— LibreChat can resolve Entra groups via Graph when token reuse is on.
Your OIDC issuer for a single tenant is:
https://login.microsoftonline.com/<tenant-id>/v2.0.
4 Configure Microsoft login (and group sync)
Add these to .env. The first block points LibreChat at
your real domain; the second is the Entra OIDC wiring.
# .env
# ── Public URLs ─────────────────────────────────────────
DOMAIN_CLIENT=https://chat.example.com
DOMAIN_SERVER=https://chat.example.com
# ── Login policy ────────────────────────────────────────
ALLOW_EMAIL_LOGIN=false # force everyone through Microsoft
ALLOW_SOCIAL_LOGIN=true
ALLOW_SOCIAL_REGISTRATION=true
ALLOW_REGISTRATION=false
# ── Microsoft Entra (Azure AD) OIDC ─────────────────────
OPENID_ISSUER=https://login.microsoftonline.com/<tenant-id>/v2.0
OPENID_CLIENT_ID=<application-client-id>
OPENID_CLIENT_SECRET=<client-secret-value>
OPENID_SCOPE=openid profile email
OPENID_CALLBACK_URL=/oauth/openid/callback
OPENID_SESSION_SECRET=change-me-to-a-long-random-string
OPENID_BUTTON_LABEL=Sign in with Microsoft
# Reuse the ID/access token so LibreChat can read Entra groups
OPENID_REUSE_TOKENS=true
5 Turn on RAG and pick an embedding model
File chat and agent knowledge run through the bundled
rag_api service, which parses documents (PDF, DOCX,
PPTX, XLSX, Markdown, TXT and more) and stores vectors in pgvector.
You only need to tell it which embedding model to
use. Still in .env:
# ── RAG API ─────────────────────────────────────────────
RAG_API_URL=http://rag_api:8000
# Option A — local embeddings via Ollama
EMBEDDINGS_PROVIDER=ollama
OLLAMA_BASE_URL=http://host.docker.internal:11434
EMBEDDINGS_MODEL=nomic-embed-text
# Option B — Azure / OpenAI embeddings (comment out A)
# EMBEDDINGS_PROVIDER=openai
# RAG_OPENAI_API_KEY=sk-...
# EMBEDDINGS_MODEL=text-embedding-3-large
Add your chat model the same way — e.g.
OPENAI_API_KEY for OpenAI/Azure, or an Ollama endpoint
declared under custom endpoints in
librechat.yaml. With at least one chat model and one
embedding model set, RAG is fully functional.
6 Put it behind TLS and launch
Front the stack with Caddy for automatic HTTPS. Create a
Caddyfile next to the repo:
# Caddyfile
chat.example.com {
reverse_proxy api:3080
}
Add a caddy service to
docker-compose.override.yml (the supported way to extend
LibreChat’s stack without editing the base file), then start
everything:
docker compose up -d
# pull local models if using Ollama
ollama pull qwen3
ollama pull nomic-embed-text
docker compose logs -f api
Browse to https://chat.example.com — you should see
a Sign in with Microsoft button.
7 Set roles and registration policy
With ALLOW_REGISTRATION=false and email login off, only
people in your tenant who pass through Microsoft can get in. The
first user is yours to promote to admin:
# make a user an admin (run from the repo)
npm run user-stats # or list users
docker compose exec api npm run set-role <email> ADMIN
Admins manage the access control system — roles and groups that decide who can see which agent. Synced Entra groups show up here as shareable audiences.
8 Build the knowledge: agents
In LibreChat a “RAG database” is expressed as an agent with attached knowledge. This is how you get both corporate and personal RAGs — the per-agent ACL does the separation.
Corporate agents (shared, curated)
- Open the Agents builder, create e.g. HR Assistant.
- Under Upload for File Search, add the source
files —
.docx,.xlsx,.pptx,.pdf,.md. The RAG API parses, chunks and embeds them into the agent’s knowledge. - Write a system prompt (“Answer only from the HR handbook and cite the section”).
- Set the agent’s sharing/ACL to a group
(e.g.
everyoneorfinance) with view access. Members pick it from the agent list; the owner controls its content.
Personal agents (user sandboxes)
Any user can build their own agent, upload their own files for File Search, and keep it private — sitting right beside the shared corporate ones. That is exactly the “users try their own small RAGs” pattern, with no extra setup; the ACL isolates them by default.
9 Query it
Users pick an agent from the dropdown and ask away — retrieval runs over that agent’s knowledge and answers come back with source citations. For one-off questions, a user can also attach a file directly to a message and chat with it without building an agent at all.
Troubleshooting
Microsoft button missing or callback fails
The Entra redirect URI must be exactly
https://chat.example.com/oauth/openid/callback, and
DOMAIN_SERVER / DOMAIN_CLIENT must be your
real HTTPS domain. If the button does not appear, confirm
ALLOW_SOCIAL_LOGIN=true and that all four
OPENID_* values are set.
Groups are not syncing
- Set
OPENID_REUSE_TOKENS=trueand add the groups claim in Entra Token configuration. - Large directories hit the “groups overage” limit and
send a link instead of the list — grant
GroupMember.Read.Allso LibreChat resolves them via Graph.
File upload succeeds but answers ignore it
The rag_api or vectordb container is
probably unhealthy, or no embedding model is set. Check
docker compose logs rag_api, confirm
RAG_API_URL and the EMBEDDINGS_* values, and
verify the embedding model actually pulled if you use Ollama.
Ollama unreachable from containers
Inside Docker, localhost is the container, not the host.
Use http://host.docker.internal:11434 (Linux may need
extra_hosts: ["host.docker.internal:host-gateway"]), or
run Ollama as a service on the same Compose network.
What you end up with
A private RAG at https://chat.example.com where staff
log in with their Microsoft account, their Entra groups flow
straight into the permission model, and they use group-scoped
corporate assistants alongside their own private ones — all
on a single VM you control, with documents and vectors that never
leave your infrastructure.