Architecture
A layered system with strict dependency direction, multiple message entry points, and a 6-step runtime pipeline at its core.
System Architecture
NarraNexus is organized into seven layers. Upper layers call downward; lower layers never reference upper layers. This keeps each layer replaceable without cascading changes.
backend/routes/
HTTP and WebSocket endpoints. Request validation, response serialization. No business logic lives here.
agent_runtime/
The AgentRuntime sequences every interaction through a 6-step pipeline. This is the brain of the system.
*_service.py
Stable public interfaces for Narrative selection and Module management. Bridge pattern delegates to private implementations.
_*_impl/
Concrete business logic behind each service. Underscore-prefixed packages, never imported directly.
services/
Long-running polling processes: JobTrigger (scheduled tasks), MessageBusTrigger (agent messaging), ModulePoller (instance state).
repository/
Generic BaseRepository with typed CRUD operations. Handles pagination, batch queries, and N+1 prevention.
schema/, utils/
Pydantic models and a singleton AsyncDatabaseClient. Supports both SQLite (local) and MySQL (production).
The Orchestration layer (AgentRuntime) is the only layer that touches both services and the API boundary. Everything below it is reusable across different entry points — the same pipeline runs whether a message comes from a WebSocket, a scheduled job, or another agent.
How Messages Enter the System
The runtime pipeline always runs the same steps regardless of where the message came from. What changes is the trigger — the mechanism that initiates the run. Each trigger attaches a WorkingSource tag so the agent can adapt its behavior to different message sources.
CHAT
WebSocket connection from the frontend. User sends a message, the pipeline runs, tokens stream back in real time. This is the default path.
JOB
The JobTrigger polls the database for due tasks (recurring reminders, monitoring jobs, one-off tasks). When a job fires, the pipeline runs autonomously and delivers results to the user’s inbox.
MESSAGE_BUS
The MessageBusTrigger polls for unread messages in agent channels. When an agent is @mentioned or receives a DM, the pipeline runs with the message as input.
MATRIX
External messaging platforms (Matrix is implemented; Telegram and Lark are planned). Messages arrive via platform-specific triggers and are normalized into the standard pipeline input.
A core design principle: for N agents, never create N listeners. Background triggers (JobTrigger, MessageBusTrigger) use a single shared poller that routes work to the correct agent. This keeps resource usage constant regardless of how many agents are deployed.
Runtime Pipeline
Every message — regardless of source — flows through this pipeline. Implementation files live in agent_runtime/_agent_runtime_steps/. The pipeline has three phases.
Load agent state, find the right storyline, decide which modules are needed, and gather all context. The user sees progress indicators during this phase.
Load agent config, create the Event record, start or resume a Session, load the agent’s Awareness profile. If EverMemOS is enabled, episode search is launched in parallel here — it runs concurrently with the next steps and is awaited just before execution.
Determine which storyline this message belongs to. An LLM continuity check compares the query against the current Narrative. If the topic changed, vector search finds a matching existing Narrative or creates a new one.
Read the conversation’s markdown history file. Parse previous interactions and instance state to inform module decisions in the next step.
An LLM decision determines which module instances should be active for this interaction (add, keep, or remove). Module objects are instantiated and MCP servers started. The execution path is chosen: Agent Loop (99% — full LLM reasoning) or Direct Trigger (1% — skip LLM, call a tool directly).
Establish instance-to-Narrative links in the database. Create new records for added instances, archive completed ones. If a JobModule instance was created, the corresponding Job record is initialized.
The agent thinks and responds. This is where the user sees tokens streaming in real time.
The ContextRuntime assembles the full context: module instructions, conversation history (dual-track memory), EverMemOS episodes (if available), social network data, awareness profile, and MCP tool URLs. This context is passed to the LLM as a system prompt + message history.
The agent runs in a loop — it can reason, call tools via MCP, observe results, and continue reasoning until it produces a final response. Each token is streamed to the user in real time via WebSocket.
For the rare Direct Trigger path (1%), the LLM is skipped entirely — the system calls the target MCP tool directly and returns the result.
Save everything that happened, then run module hooks in the background. The user has already seen the response — this phase is mostly invisible.
Record the execution trajectory, update the Event with the final output, refresh the Narrative summary and embedding (LLM call for the main Narrative), update the Session, and log token costs. This step blocks briefly because the Event data is needed by hooks.
Dispatched to a background task — the user’s connection closes immediately. Each active module’s hook_after_event_execution runs in parallel: ChatModule saves messages, SocialNetworkModule extracts entity info (1–3 LLM calls), JobModule evaluates completion conditions, MemoryModule writes to EverMemOS. Callback results trigger any dependent instances.