EverMemOS
An episodic memory service with auto-segmentation, hybrid search, and deep cross-session recall. Optional infrastructure for production deployments that need memory at scale.
What It Provides
EverMemOS adds a dedicated memory layer on top of builtin memory. While builtin memory handles recent conversations, EverMemOS manages the long tail — hundreds or thousands of past interactions organized into searchable episodes.
Conversations are automatically segmented into episodes — coherent units of interaction with summaries, participants, and timestamps. Episodes are the primary retrieval unit.
Reciprocal Rank Fusion combining BM25 keyword search (via Elasticsearch) with vector similarity search (via Milvus). Produces more accurate results than either method alone.
The system automatically determines episode boundaries from conversation flow. No manual tagging or annotation required.
Episode search runs in parallel with narrative selection (Step 0), not blocking the main pipeline. Results are awaited just before LLM execution.
Architecture
EverMemOS runs as a separate HTTP service (default port 1995) backed by three storage engines:
All three are deployed via Docker. The NarraNexus backend communicates with EverMemOS through its HTTP API — no direct database connections.
Pipeline Integration
EverMemOS integrates at four points in the runtime pipeline:
Graceful Degradation
If the EverMemOS service is unavailable (Docker not running, network timeout, service error), the system falls back silently to native vector retrieval. No user-visible errors, no pipeline failures. The agent continues to function with builtin memory only. This means you can enable EverMemOS in configuration and deploy the infrastructure later — the code is ready whenever the services are.
Data Flow
Memory is organized by agent. Each agent has its own conversation group in EverMemOS, isolated from other agents.
When To Use
EverMemOS is worth deploying when:
- Your agents serve many users with long conversation histories
- Cross-session recall quality is critical to your use case
- You need keyword + semantic hybrid search for precise memory retrieval
- You're running in production with Docker infrastructure already in place
For local development or small deployments, builtin memory alone is sufficient.