Shared Memory for Agent Fleets: hermes-memory-pgvector

When you run more than one AI agent — a marketing minion, a trading minion, an incident-response minion — they each need memory. The built-in memory tool gives each agent its own. That is fine until you want them to share, or until you want to recall what the agent learned six weeks ago without paying for an LLM round-trip just to look it up.

hermes-memory-pgvector is a small Postgres + pgvector plugin that mirrors agent memory writes into a shared, embedded, queryable store. Published on PyPI at v0.4.0.

New in v0.4.0

The latest release keeps the "no LLM in the hot path" rule and adds four storage-layer capabilities:

Identity governance. Direct-message session keys collapse into a single privacy-safe bucket — no per-contact theme sprawl, no PII — benchmark traffic is quarantined, and an optional allow-list routes typo'd theme names to a safe default instead of silently minting a new one.
Agent attribution & delegation. A new registry and provenance edges record which agent delegated what to whom, queryable through a database view. Pure who/when provenance — never a fact store.
Embedding backfill. Rows written text-only during an embedding-endpoint outage are no longer stranded: one idempotent command re-embeds them so they become searchable again.
Conversation TTL & cost controls. An operator-run prune trims old chat turns (durable memories are never touched), and an embed policy dials embedding cost up or down.

All of it ships behind a maintenance CLI (hermes-pgvector) whose destructive commands default to dry-run. v0.4.0 is a drop-in upgrade from v0.3.x — apply one additive migration and the new hooks light up; skip it and everything else runs unchanged.

What it actually does

In one line: "a storage layer that gives the built-in memory model durable, multi-tenant, semantically-searchable backing, with no LLM in the hot path."

Two tables, both with HNSW vector indexes. memory_entries mirrors writes to the agent's MEMORY.md/USER.md files. conversations stores substantive chat turns (≥40 chars, boilerplate filtered out). Embeddings are 768-dim, computed by an external endpoint — Ollama, OpenAI-compatible, your choice. The agent never blocks on it: writes return in microseconds and the embedding worker handles the rest on a background queue.

Per-agent themes by default

Every request carries an X-Hermes-Session-Key header that scopes data by agent_identity. Marketing's notes do not pollute trading's recall. When you actually want cross-theme search, pass scope='all' explicitly. The default is the safe one.

Why standalone, not a fork

hermes-agent closed its built-in memory provider list per policy, so this lives as a separate /plugins directory scan instead of an upstream fork. Drop it next to the agent, set a few config keys, restart. Rollback is symmetric: disable the provider, optionally drop the tables. No long-lived state to migrate, no kernel patches to maintain.

What it is not

Not a Honcho replacement. Not a knowledge graph. Not a RAG framework. It is intentionally a thin layer that does one thing — turn the built-in memory tool into a shared, durable, vector-searchable store — and gets out of the way. No LLM deriver, no dialectic loop, no opinion about how you should chunk or rerank. Just vector math.

Get it

pip install hermes-memory-pgvector

Source, migration, and config on GitHub:

👉 github.com/andreab67/hermes-memory-pgvector