Chat & the agent service
On this page
dembrane has two ways to “talk to your data”: standard chat (retrieval-augmented
generation served by the main backend) and agentic chat (a tool-using agent that runs in a
separate service). This page explains the split, the agent service in echo/agent/, the tools
it exposes, and how turns are coordinated with Redis leases. For the conceptual feature view
see chat & ask; for retrieval and model groups see
the processing pipeline.
Standard chat vs agentic chat#
| Standard chat | Agentic chat | |
|---|---|---|
| Where it runs | Main FastAPI backend (:8000) |
Standalone agent service (:8001) |
| How it works | RAG over a selected set of conversations - retrieve, then answer | A LangGraph agent that calls tools to decide what to read, iterating until it can answer |
| Retrieval mode | overview (summaries) or deep_dive (transcripts) |
Tool-driven: lists conversations, keyword-searches, pulls transcripts on demand |
| Toggle | Default chat mode | “Agentic mode” in the chat UI |
| State | project_chat + project_chat_message |
project_agentic_run + project_agentic_run_event (plus the chat tables) |
Both are scoped to a project and a selection of conversations as context. The “sources” a
message cites are recorded through the project_chat_message_conversation join. Standard chat’s
two retrieval modes - overview over summaries, deep_dive over full transcripts - are
covered in the processing pipeline.
Why the agent is a separate service#
echo/agent/ is intentionally isolated (see its README.md):
- It keeps agent execution out of the frontend runtime.
- It avoids dependency conflicts with
echo/server(CopilotKit/LangGraph pull in their own stack). - It supports long-running execution with a backend-owned run lifecycle - the agent does the thinking, but auth, persistence and notifications stay with the
echo/servergateway.
It exposes a tiny surface:
GET /healthPOST /copilotkit/{project_id}- the CopilotKit endpoint the run flows through.
It builds its graph in echo/agent/agent.py (a LangGraph StateGraph over CopilotKitState)
and reads project data via echo/agent/echo_client.py, which calls back into the backend. Auth
is handled in echo/agent/auth.py.
The tools#
The agent’s graph (create_agent_graph in agent.py) binds a focused tool set. The model is
nudged to get an overview first, then narrow:
listProjectConversations(limit)- the inventory of conversations in the project. Start here.findConvosByKeywords(keywords, limit)- keyword search across the project; the prompt steers toward 2–4 focused keywords over sentence-style queries, with a guardrail that rejects low-signal queries and stops the agent repeating the same search.getConversationTranscript/listConvoFullTranscript(conversation_id)- pull a full transcript on demand.listConvoSummary(conversation_id)- the summary for one conversation.grepConvoSnippets(conversation_id, query, limit)- grep snippets out of a transcript.get_project_scope()- the project context the run is bound to.sendProgressUpdate(update, next_steps)- emit a progress event so the UI can show what the agent is doing mid-run.
The graph guards against runaway loops (counting tool calls since the last assistant update, nudging the model to answer from gathered evidence rather than searching forever).
The lease-based turn runtime#
Agentic runs are long and resumable, so the backend owns the lifecycle and uses Redis leases
to make sure a turn is processed exactly once. The runtime primitives are in
echo/server/dembrane/agentic_runtime.py; the worker that drives a run is
echo/server/dembrane/agentic_worker.py (process_agentic_run).
Keys are namespaced under agentic:run:{run_id}:turn:{turn_seq}:…:
| Helper | Key | Purpose |
|---|---|---|
acquire_turn_lease / refresh_turn_lease / release_turn_lease |
…:lease |
A TTL’d lease an owner holds while processing a turn. Acquire is atomic; refresh/release use a Lua check-and-act so only the owner can extend or drop it. |
request_cancel / is_cancel_requested / clear_cancel |
…:cancel |
Cooperative cancellation - the worker checks _raise_if_cancelled between steps and bails cleanly. |
publish_live_event / subscribe_live_events / read_live_event |
agentic:run:{run_id} channel |
The live event stream over Redis pub/sub, relayed to the client over SSE. |
So a turn’s life is: acquire the lease → run the LangGraph step, appending events to
project_agentic_run_event and publishing them live → periodically refresh the lease and check
for cancellation → on completion persist the assistant message to the chat and release the
lease. Because the lease has a TTL, a worker that dies frees the turn for another to pick up,
without double-processing.
Note
The durable record (project_agentic_run_event) and the live pub/sub stream carry the same
events. The DB rows let a client that reconnects rebuild history; the pub/sub stream gives a
connected client low-latency updates. Don’t rely on pub/sub alone - a missed message must be
recoverable from the event rows.
How a run flows#
- The dashboard opens an agentic chat; the backend creates a
project_agentic_run. - A turn is enqueued; the agentic worker acquires the turn lease and starts driving the LangGraph.
- The agent calls tools (
listProjectConversations,findConvosByKeywords, transcript pulls), each tool result and progress update appended as aproject_agentic_run_eventand published live. - The worker refreshes the lease while it works and checks for cancellation between steps.
- On finish, the assistant message is persisted to
project_chat_messageand the lease released. - The client renders the stream over SSE, backed by the Redis channel.
Models#
The agent runs on Gemini (its _build_llm constructs a ChatGoogleGenerativeAI; set
GEMINI_API_KEY). The main backend’s chat goes through the LiteLLM Router groups - TEXT_FAST
for streaming, MULTI_MODAL_PRO for richer turns. See
the processing pipeline.
Tiers#
Built-in chat-with-analysis (the Gemini path) is Changemaker+. Innovator replaces the built-in analysis with bring-your-own-LLM via MCP - connect ChatGPT/Claude - which is coming soon (gated on MCP shipping). Free-tier chat is gated. See tiers & billing and MCP & bring-your-own-LLM.
Related
Comments