Chat & the agent service

On this page

Standard chat vs agentic chat
Why the agent is a separate service
The tools
The lease-based turn runtime
How a run flows
Models
Tiers

dembrane has two ways to “talk to your data”: standard chat (retrieval-augmented generation served by the main backend) and agentic chat (a tool-using agent that runs in a separate service). This page explains the split, the agent service in echo/agent/, the tools it exposes, and how turns are coordinated with Redis leases. For the conceptual feature view see chat & ask; for retrieval and model groups see the processing pipeline.

Standard chat vs agentic chat

	Standard chat	Agentic chat
Where it runs	Main FastAPI backend (`:8000`)	Standalone agent service (`:8001`)
How it works	RAG over a selected set of conversations - retrieve, then answer	A LangGraph agent that calls tools to decide what to read, iterating until it can answer
Retrieval mode	`overview` (summaries) or `deep_dive` (transcripts)	Tool-driven: lists conversations, keyword-searches, pulls transcripts on demand
Toggle	Default chat mode	“Agentic mode” in the chat UI
State	`project_chat` + `project_chat_message`	`project_agentic_run` + `project_agentic_run_event` (plus the chat tables)

Both are scoped to a project and a selection of conversations as context. The “sources” a message cites are recorded through the project_chat_message_conversation join. Standard chat’s two retrieval modes - overview over summaries, deep_dive over full transcripts - are covered in the processing pipeline.

Why the agent is a separate service

echo/agent/ is intentionally isolated (see its README.md):

It keeps agent execution out of the frontend runtime.
It avoids dependency conflicts with echo/server (CopilotKit/LangGraph pull in their own stack).
It supports long-running execution with a backend-owned run lifecycle - the agent does the thinking, but auth, persistence and notifications stay with the echo/server gateway.

It exposes a tiny surface:

GET /health
POST /copilotkit/{project_id} - the CopilotKit endpoint the run flows through.

It builds its graph in echo/agent/agent.py (a LangGraph StateGraph over CopilotKitState) and reads project data via echo/agent/echo_client.py, which calls back into the backend. Auth is handled in echo/agent/auth.py.

The tools

The agent’s graph (create_agent_graph in agent.py) binds a focused tool set. The model is nudged to get an overview first, then narrow:

listProjectConversations(limit) - the inventory of conversations in the project. Start here.
findConvosByKeywords(keywords, limit) - keyword search across the project; the prompt steers toward 2–4 focused keywords over sentence-style queries, with a guardrail that rejects low-signal queries and stops the agent repeating the same search.
getConversationTranscript / listConvoFullTranscript(conversation_id) - pull a full transcript on demand.
listConvoSummary(conversation_id) - the summary for one conversation.
grepConvoSnippets(conversation_id, query, limit) - grep snippets out of a transcript.
get_project_scope() - the project context the run is bound to.
sendProgressUpdate(update, next_steps) - emit a progress event so the UI can show what the agent is doing mid-run.

The graph guards against runaway loops (counting tool calls since the last assistant update, nudging the model to answer from gathered evidence rather than searching forever).

The lease-based turn runtime

Agentic runs are long and resumable, so the backend owns the lifecycle and uses Redis leases to make sure a turn is processed exactly once. The runtime primitives are in echo/server/dembrane/agentic_runtime.py; the worker that drives a run is echo/server/dembrane/agentic_worker.py (process_agentic_run).

Keys are namespaced under agentic:run:{run_id}:turn:{turn_seq}:…:

Helper	Key	Purpose
`acquire_turn_lease` / `refresh_turn_lease` / `release_turn_lease`	`…:lease`	A TTL’d lease an owner holds while processing a turn. Acquire is atomic; refresh/release use a Lua check-and-act so only the owner can extend or drop it.
`request_cancel` / `is_cancel_requested` / `clear_cancel`	`…:cancel`	Cooperative cancellation - the worker checks `_raise_if_cancelled` between steps and bails cleanly.
`publish_live_event` / `subscribe_live_events` / `read_live_event`	`agentic:run:{run_id}` channel	The live event stream over Redis pub/sub, relayed to the client over SSE.

So a turn’s life is: acquire the lease → run the LangGraph step, appending events to project_agentic_run_event and publishing them live → periodically refresh the lease and check for cancellation → on completion persist the assistant message to the chat and release the lease. Because the lease has a TTL, a worker that dies frees the turn for another to pick up, without double-processing.

Note

The durable record (project_agentic_run_event) and the live pub/sub stream carry the same events. The DB rows let a client that reconnects rebuild history; the pub/sub stream gives a connected client low-latency updates. Don’t rely on pub/sub alone - a missed message must be recoverable from the event rows.

How a run flows

The dashboard opens an agentic chat; the backend creates a project_agentic_run.
A turn is enqueued; the agentic worker acquires the turn lease and starts driving the LangGraph.
The agent calls tools (listProjectConversations, findConvosByKeywords, transcript pulls), each tool result and progress update appended as a project_agentic_run_event and published live.
The worker refreshes the lease while it works and checks for cancellation between steps.
On finish, the assistant message is persisted to project_chat_message and the lease released.
The client renders the stream over SSE, backed by the Redis channel.

Models

The agent runs on Gemini (its _build_llm constructs a ChatGoogleGenerativeAI; set GEMINI_API_KEY). The main backend’s chat goes through the LiteLLM Router groups - TEXT_FAST for streaming, MULTI_MODAL_PRO for richer turns. See the processing pipeline.

Tiers

Built-in chat-with-analysis (the Gemini path) is Changemaker+. Innovator replaces the built-in analysis with bring-your-own-LLM via MCP - connect ChatGPT/Claude - which is coming soon (gated on MCP shipping). Free-tier chat is gated. See tiers & billing and MCP & bring-your-own-LLM.

Related

Chat & the agent service

Standard chat vs agentic chat#

Why the agent is a separate service#

The tools#

The lease-based turn runtime#

How a run flows#

Models#

Tiers#

Related

Comments