NarraNexus · Modules · Memory

EverMemOS

An episodic memory service with auto-segmentation, hybrid search, and deep cross-session recall. Optional infrastructure for production deployments that need memory at scale.

What It Provides

EverMemOS adds a dedicated memory layer on top of builtin memory. While builtin memory handles recent conversations, EverMemOS manages the long tail — hundreds or thousands of past interactions organized into searchable episodes.

Episodic Memory

Conversations are automatically segmented into episodes — coherent units of interaction with summaries, participants, and timestamps. Episodes are the primary retrieval unit.

Hybrid RRF Search

Reciprocal Rank Fusion combining BM25 keyword search (via Elasticsearch) with vector similarity search (via Milvus). Produces more accurate results than either method alone.

Auto-Segmentation

The system automatically determines episode boundaries from conversation flow. No manual tagging or annotation required.

Parallel Retrieval

Episode search runs in parallel with narrative selection (Step 0), not blocking the main pipeline. Results are awaited just before LLM execution.

Architecture

EverMemOS runs as a separate HTTP service (default port 1995) backed by three storage engines:

MilvusVector database storing 1024-dim episode embeddings for semantic similarity search

ElasticsearchFull-text search engine providing BM25 keyword retrieval with CJK support

MongoDBDocument store for raw memory objects, metadata, and episode content

All three are deployed via Docker. The NarraNexus backend communicates with EverMemOS through its HTTP API — no direct database connections.

Pipeline Integration

EverMemOS integrates at four points in the runtime pipeline:

Step 0Episode search launched as a parallel async task, runs concurrently with Steps 1-2

Step 1Narrative retrieval can optionally use EverMemOS for narrative-level search, with fallback to native vector search

Step 1.5Retrieved episodes injected into the system prompt as a "Relevant Memory" section

Step 4New Events written to EverMemOS asynchronously (non-blocking) via the MemoryModule's post-execution hook

Graceful Degradation

If the EverMemOS service is unavailable (Docker not running, network timeout, service error), the system falls back silently to native vector retrieval. No user-visible errors, no pipeline failures. The agent continues to function with builtin memory only. This means you can enable EverMemOS in configuration and deploy the infrastructure later — the code is ready whenever the services are.

Data Flow

Memory is organized by agent. Each agent has its own conversation group in EverMemOS, isolated from other agents.

WriteAfter each pipeline run, user input and agent output are converted to messages and posted to /api/v1/memories. EverMemOS handles segmentation into episodes automatically.

ReadOn each new query, the client calls /api/v1/memories/search with the user's input. Returns ranked episodes with text, summaries, scores, and participant info.

When To Use

EverMemOS is worth deploying when:

Your agents serve many users with long conversation histories
Cross-session recall quality is critical to your use case
You need keyword + semantic hybrid search for precise memory retrieval
You're running in production with Docker infrastructure already in place

For local development or small deployments, builtin memory alone is sufficient.