Skip to content

AgentCoreMemorySessionManager silently drops conversation history when the metadata-filtered ListEvents (read_agent/read_session) is not yet consistent #564

Description

@WorldWriter

Summary

AgentCoreMemorySessionManager (in bedrock_agentcore.memory.integrations.strands) restores a session's conversation history only if its read_agent() / read_session() calls find the prior AGENT / SESSION marker events. Those two reads use a metadata-filtered ListEvents query, which appears to be eventually consistent on the service side. When the filter transiently returns nothing (even though the matching marker event exists and the raw conversation events are fully persisted), the session manager silently treats the turn as a brand-new agent, creates a fresh agent record, and replays no history. The agent then runs with an empty agent.messages, so the model loses all prior context for that turn.

The conversation data itself is never lost — the unfiltered ListEvents (used by list_messages()) is strongly consistent and always returns the full history. Only the gate (the metadata-filtered read) is flaky, and the failure is intermittent and silent.

Environment

  • bedrock-agentcore 1.14.1 (also reproduced/confirmed on 1.15.1 — same logic)
  • strands-agents 1.42.0
  • Runtime: Amazon Bedrock AgentCore Runtime; Memory resource with USER_PREFERENCE + SEMANTIC long-term strategies
  • Region: ap-southeast-1

Expected vs. Actual

  • Expected: within a stable session_id + actor_id, every follow-up turn is given the prior conversation as context (the documented "short-term memory" behavior).
  • Actual: intermittently, a follow-up turn runs with no history; the model behaves as if the conversation just started. Persists across turns (each affected turn is independent). No error is raised or logged.

Root cause

Message restoration in RepositorySessionManager.initialize() is gated on read_agent() returning a non-None SessionAgent (and the session is resolved via read_session() in __init__). Both reads are metadata-filtered ListEvents calls with max_results=1:

  • read_agent() — filters stateType == AGENT AND agentId == <id> (session_manager.py, ~L461-467)
  • read_session() — filters stateType == SESSION (session_manager.py, ~L318-324)
  • both go through MemoryClient.list_events(..., event_metadata=[...]), which sends filter={"eventMetadata":[...]} (memory/client.py, ~L861-885)

The service-side metadata filter is eventually consistent: shortly (and sometimes not-so-shortly) after the marker events for a turn are written, a metadata-filtered query may not return them yet. When read_agent() returns None, initialize() takes the "new agent" branch → create_agent() (the Created agent: default in session: ... log line) → no call to list_messages() → history is not replayed.

Crucially, list_messages() itself does not use the metadata filter — it reads raw events with a plain list_events() and is strongly consistent. So the data needed to restore is available; the manager just never reads it because the gate failed.

There is no retry and no fallback from the metadata-filtered read to the strongly-consistent unfiltered read, and the miss is silent.

Reproduction

  1. Use a AgentCoreMemorySessionManager with a real Memory resource. Send turn 1 in a fresh session_id (e.g. "remember my number is 73").
  2. Send turn 2 in the same session_id after a short delay (we saw it with gaps from ~150s up to ~2h).
  3. Intermittently, turn 2's agent.messages is empty and the model has no memory of turn 1, while the runtime logs Created agent: default in session: <same id>.
  4. Query the session afterwards: the raw events (both turns) are all present, and constructing a fresh Agent(session_manager=sm) against the same session does restore the full history — confirming the data was always there and the failure was a transient read at invoke time.

Experiments we ran (to isolate it)

  1. Dumped the session via unfiltered list_events → all conversation events for both turns present under the same (memory, actor, session). (Rules out "data not written".)
  2. Ran the installed SDK's read_agent() / read_session() / list_messages() against the live session → all returned the events (after the index had caught up). (Shows the read path is correct when consistent.)
  3. Constructed a real Agent(session_manager=sm) exactly as our runtime does → agent.messages restored the full history. (Rules out "restore is broken".)
  4. Live 2-turn rapid test against the deployed runtime (seconds apart) → turn 2 recalled the fact. (Works.)
  5. Live 2-turn test with a 150s gap → turn 2 had no history (model: "I have no cross-conversation memory"). (Reproduced the failure.)
  6. Live 3rd turn (cold, ~15 min later) on the rapid session → recalled the fact. (Works.)
  7. Post-hoc restore of the failed (gap) session via the SDK → read_session/read_agent FOUND, list_messages returned all turns. (Proves the failure was a transient read at invoke time, not data/version.)
  8. Compared bedrock-agentcore 1.14.1 vs 1.15.1 → identical list_events and session_manager restore logic. (Rules out "fixed by upgrade".)

Net: 2 reproduced failures, multiple successes, same code/data/version → the only variable is the consistency of the metadata-filtered read at invoke time.

Impact

Silent, intermittent loss of conversation context in production multi-turn agents using the documented short-term-memory integration. Hard to detect (no error, no log), and not fixable by upgrading.

Recommended fix

Restoration should not depend on an eventually-consistent metadata-filtered read. Options, in order of preference:

  1. Restore messages from the strongly-consistent unfiltered read. In initialize(), after read_agent(), if it returns None but list_messages(session_id, agent_id) (unfiltered) returns a non-empty history, treat the agent as existing and replay that history instead of creating a new agent. (Decouples message restore from the flaky agent/session marker lookup.)
  2. Add bounded retry with backoff to read_agent() / read_session() for the read-after-write window when a marker event is expected.
  3. At minimum, make it observable: log a warning when initialize() takes the "new agent" branch while unfiltered events for the session already exist (i.e. a likely false "new session").

A minimal, behavior-preserving version of (1): when the metadata-filtered read_agent misses, fall back to the unfiltered list_messages to decide existence + restore.

Minimal reproduction script

Self-contained; exercises the exact SDK code path (a fresh AgentCoreMemorySessionManager + Agent per turn, as a stateless runtime invoke would). The failure is consistency/timing dependent, so the script loops over fresh sessions until it catches one; on failure it immediately proves the data exists via the strongly-consistent unfiltered read.

#!/usr/bin/env python3
"""Repro: AgentCoreMemorySessionManager intermittently skips history restore.

Requires: pip install "bedrock-agentcore==1.14.1" "strands-agents==1.42.0"
          AWS credentials for an account with an AgentCore Memory resource.
Usage:    python repro.py <MEMORY_ID> [region] [gap_seconds]
"""
import sys, time, uuid

from bedrock_agentcore.memory.integrations.strands.config import AgentCoreMemoryConfig
from bedrock_agentcore.memory.integrations.strands.session_manager import AgentCoreMemorySessionManager
from strands import Agent
from strands.models import BedrockModel

MEMORY_ID = sys.argv[1]
REGION    = sys.argv[2] if len(sys.argv) > 2 else "ap-southeast-1"
GAP       = int(sys.argv[3]) if len(sys.argv) > 3 else 150
ACTOR     = "repro-actor"
MODEL     = "apac.amazon.nova-lite-v1:0"   # any cheap in-region model

def new_sm(session_id: str) -> AgentCoreMemorySessionManager:
    cfg = AgentCoreMemoryConfig(memory_id=MEMORY_ID, session_id=session_id, actor_id=ACTOR)
    return AgentCoreMemorySessionManager(cfg, REGION)

for attempt in range(1, 21):
    sid = f"repro-{uuid.uuid4()}"                      # >= 33 chars, fresh session

    # ---- turn 1: real model call so user+assistant events are persisted
    agent1 = Agent(model=BedrockModel(model_id=MODEL, region_name=REGION),
                   system_prompt="Reply in five words or fewer.",
                   session_manager=new_sm(sid), callback_handler=None)
    agent1("Remember: my lucky number is 73.")

    time.sleep(GAP)                                     # simulate the next user turn arriving later

    # ---- turn 2: brand-new manager + agent on the SAME session
    #      (exactly what a stateless runtime does on the next invoke)
    sm2 = new_sm(sid)
    agent2 = Agent(model=BedrockModel(model_id=MODEL, region_name=REGION),
                   system_prompt="Reply in five words or fewer.",
                   session_manager=sm2, callback_handler=None)

    restored   = len(agent2.messages)                          # what initialize() replayed
    unfiltered = len(sm2.list_messages(sid, agent2.agent_id))  # strongly-consistent ground truth
    print(f"[{attempt}] session={sid} restored={restored} unfiltered={unfiltered}")

    if restored == 0 and unfiltered > 0:
        print(">>> REPRODUCED: initialize() replayed nothing, yet the unfiltered "
              f"list_messages returns {unfiltered} messages for the same session — "
              "the metadata-filtered read_agent/read_session missed the marker events.")
        break
else:
    print("Not reproduced in 20 attempts — the miss window depends on service-side "
          "index consistency; retry, vary the gap, or run at higher write rates.")

Observed signal when it hits: restored=0 unfiltered=4 (turn 1's user+assistant are on disk, but nothing was replayed), matching the production Created agent: default in session: <same id> log line on the second turn.

Suggested patch (sketch)

Untested sketch of recommendation (1) — decouple restore from the eventually-consistent marker lookup by falling back to the strongly-consistent unfiltered read inside read_agent():

# bedrock_agentcore/memory/integrations/strands/session_manager.py
def read_agent(self, session_id: str, agent_id: str, **kwargs: Any) -> Optional[SessionAgent]:
    agent = self._read_agent_filtered(session_id, agent_id)   # existing metadata-filtered lookup
    if agent is not None:
        return agent

    # Fallback: the metadata-filtered ListEvents is eventually consistent and can
    # miss a just-written AGENT marker. The unfiltered read is strongly consistent —
    # if conversational events exist for this session, the agent DOES exist.
    if self.list_messages(session_id, agent_id, limit=1):
        logger.warning(
            "read_agent: metadata filter returned nothing but session %s has events; "
            "treating agent %s as existing to avoid dropping history", session_id, agent_id)
        return SessionAgent(
            agent_id=agent_id,
            state={},
            conversation_manager_state=NullConversationManager().get_state(),  # or the configured default
        )
    return None

Notes: the reconstructed SessionAgent loses any persisted state / conversation_manager_state for that one turn (they re-sync on the next write), which is strictly better than silently dropping the entire conversation. The same fallback shape applies to read_session(). Alternatively (recommendation 2), a bounded retry with backoff on the filtered read also closes most of the window, at the cost of latency.

Current workaround (in our runtime)

After constructing the agent, if session_manager is attached but agent.messages is empty, we re-load history via the strongly-consistent session_manager.list_messages(session_id, agent.agent_id) and assign it to agent.messages:

agent = Agent(**agent_kwargs)   # agent_kwargs includes session_manager=sm
if sm is not None and not agent.messages:
    restored = sm.list_messages(session_id, agent.agent_id)
    if restored:
        agent.messages = [m.to_message() for m in restored]
        log.warning("Memory restore fallback hit: managed restore was empty, "
                    "re-loaded %d messages via list_messages", len(agent.messages))

Direct assignment does not enqueue writes (verified pending_message_count() unchanged), so it does not re-persist or duplicate. This fully and reliably eliminates the symptom.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions