Configuration & LLM providers

On this page

When you self-host dembrane you bring your own language models. dembrane doesn’t hard-code a provider: it routes every model call through a LiteLLM router that you configure entirely with environment variables. This page covers that scheme, the transcription and embedding configuration, choosing EU regions, and the feature toggles you’ll most often reach for.

Note

The canonical, code-derived reference for these variables is echo/docs/litellm_config.md in the repository. This page explains the shape and the decisions; keep that file open alongside it for the exhaustive list and the startup-log examples.

Model groups

Rather than naming a model per feature, dembrane groups model calls by capability. You configure each group, and the code asks the router for the right group at the right time.

Group Used for Audio
MULTI_MODAL_PRO Chat, reports, replies, artifact generation (Gemini 2.5 Pro by default) Yes
MULTI_MODAL_FAST Realtime verification (Gemini 2.5 Flash) Yes
TEXT_FAST Summaries, chat streaming, auto-select (text only) No

MULTI_MODAL_* groups must be backed by models that accept audio input (Gemini does), as they’re used in the processing pipeline. TEXT_FAST is text-only.

The LLM__<GROUP>[_n]__* scheme

Each group has a primary deployment and any number of fallback deployments. The variable name encodes the group, the (optional) deployment index, and the parameter:

LLM__<GROUP>__<PARAM>        # primary deployment
LLM__<GROUP>_<n>__<PARAM>    # fallback deployment n (1, 2, 3, …)

A minimal single-deployment group:

LLM__TEXT_FAST__MODEL=azure/gpt-4o-mini
LLM__TEXT_FAST__API_BASE=https://westeurope.openai.azure.com
LLM__TEXT_FAST__API_KEY=sk-azure-west-...
LLM__TEXT_FAST__API_VERSION=2024-02-15-preview

The exact set of <PARAM> keys depends on the provider - MODEL, API_KEY, API_BASE, API_VERSION for OpenAI/Azure; VERTEX_LOCATION, VERTEX_PROJECT, VERTEX_CREDENTIALS for Google Vertex AI, and so on. They map onto the parameters LiteLLM expects for that provider.

Weighting and failover

Add numbered deployments to the same group and the router load-balances and fails over across them automatically:

# Primary
LLM__TEXT_FAST__MODEL=azure/gpt-4o-mini
LLM__TEXT_FAST__API_BASE=https://westeurope.openai.azure.com
LLM__TEXT_FAST__API_KEY=sk-azure-west-...
LLM__TEXT_FAST__API_VERSION=2024-02-15-preview

# Fallback 1
LLM__TEXT_FAST_1__MODEL=azure/gpt-4o-mini
LLM__TEXT_FAST_1__API_BASE=https://swedencentral.openai.azure.com
LLM__TEXT_FAST_1__API_KEY=sk-azure-sweden-...
LLM__TEXT_FAST_1__API_VERSION=2024-02-15-preview

# Fallback 2
LLM__TEXT_FAST_2__MODEL=vertex_ai/gemini-2.0-flash
LLM__TEXT_FAST_2__VERTEX_LOCATION=europe-west1
LLM__TEXT_FAST_2__VERTEX_PROJECT=dembrane-prod
LLM__TEXT_FAST_2__VERTEX_CREDENTIALS=${GCP_SA_JSON}

At startup the backend logs a deployment summary - how many deployments each group has, their models, regions, roles (primary/fallback), and weights. Read those lines first when a group isn’t behaving as expected; they tell you exactly what the router parsed from your environment.

Transcription

Transcription is the first step of the pipeline. You choose the provider:

After raw transcription, a Gemini correction pass applies your project’s key terms (hotwords) and PII redaction, producing a diarised transcript. That correction pass uses the MULTI_MODAL_* groups, which is why those must support audio. See transcription for the host-facing view.

Embeddings

Chat and library analysis search over embeddings stored in pgvector. Configure an embedding model (for example an Azure or Vertex embedding deployment) in server/.env - the LIGHTRAG_LITELLM_EMBEDDING_* variables documented in echo/docs/litellm_config.md. Without a working embedding model, retrieval-augmented chat & Ask has nothing to retrieve over.

EU regions

Because every deployment names its own region, residency is a configuration choice. A worked EU-only example, mixing Azure and Vertex across European regions:

# TEXT_FAST: Azure West Europe (primary) + Sweden Central + Gemini EU
LLM__TEXT_FAST__MODEL=azure/gpt-4o-mini
LLM__TEXT_FAST__API_BASE=https://westeurope.openai.azure.com
LLM__TEXT_FAST__API_KEY=sk-azure-west-...
LLM__TEXT_FAST__API_VERSION=2024-02-15-preview
LLM__TEXT_FAST_1__MODEL=azure/gpt-4o-mini
LLM__TEXT_FAST_1__API_BASE=https://swedencentral.openai.azure.com
LLM__TEXT_FAST_1__API_KEY=sk-azure-sweden-...
LLM__TEXT_FAST_1__API_VERSION=2024-02-15-preview

# MULTI_MODAL_PRO: Gemini in EU regions (audio required)
LLM__MULTI_MODAL_PRO__MODEL=vertex_ai/gemini-2.5-pro
LLM__MULTI_MODAL_PRO__VERTEX_LOCATION=europe-west1
LLM__MULTI_MODAL_PRO__VERTEX_PROJECT=dembrane-prod
LLM__MULTI_MODAL_PRO__VERTEX_CREDENTIALS=${GCP_SA_JSON}
LLM__MULTI_MODAL_PRO_1__MODEL=vertex_ai/gemini-2.5-pro
LLM__MULTI_MODAL_PRO_1__VERTEX_LOCATION=europe-west4
LLM__MULTI_MODAL_PRO_1__VERTEX_PROJECT=dembrane-prod
LLM__MULTI_MODAL_PRO_1__VERTEX_CREDENTIALS=${GCP_SA_JSON}

Pair EU model regions with EU object storage and email for an end-to-end European footprint.

Key feature toggles

A few environment toggles change behaviour you’ll care about while integrating or operating:

The audio-LightRAG knobs (AUDIO_LIGHTRAG_*), the LightRAG model variables (LIGHTRAG_LITELLM_*), and the full toggle list are enumerated in echo/docs/litellm_config.md. When a toggle here and the code disagree, trust the code and that file.

Warning

DISABLE_REDACTION removes a privacy safeguard from transcripts. If you handle personal data, leave redaction on and review your obligations - see data ownership & compliance.

Related

Comments