codex-agent-mem

v1.0.0 verification results

These are reproducible, sanitized results generated from synthetic fixtures.

Execution context:

Scope: this validates Codex Desktop, not a ChatGPT web/app connector. Claude web / claude.ai support is also not inferred from these results.

Snapshot

Scenario Source tokens Pack tokens Saved not_modified Tools Lazy init Read-only
Small project continuity 1,841 216 88.27% true 4 false->true true
Medium agent workflow 4,855 233 95.20% true 4 false->true true
Large repeated audit 9,731 232 97.62% true 4 false->true true
Sub-agent handoff example 6,523 239 96.34% true 4 false->true true

Across these fixtures: ~22,950 source tokens -> ~920 pack tokens. Approximate repeated-context reduction: 95.99% (~22,030 tokens not resent).

Token savings by scenario

Small project continuity

A short project where the agent should remember objective, constraints and open work without repeating the whole discussion.

source [############################] 100.00% pack [###…………………….] 11.73% saved [#########################…] 88.27%

Medium agent workflow

A realistic multi-step implementation with repeated decisions, pending work and DoD requirements.

source [############################] 100.00% pack [#………………………] 4.80% saved [###########################.] 95.20%

Large repeated audit

A long audit where the same constraints and decisions would normally be re-sent many times.

source [############################] 100.00% pack [#………………………] 2.38% saved [###########################.] 97.62%

Sub-agent handoff example

A project where an explorer sub-agent audits context and a worker sub-agent implements a bounded change.

source [############################] 100.00% pack [#………………………] 3.66% saved [###########################.] 96.34%

Sub-agent example:

Repeated context avoided

known_pack_hash lets the agent ask whether a pack changed before re-sending it.

Scenario Result
Small project continuity not_modified=true
Medium agent workflow not_modified=true
Large repeated audit not_modified=true
Sub-agent handoff example not_modified=true

Runtime safety

Metric Result
Minimal profile tools mem_open_work, mem_completion_check, mem_context_pack, mem_health_runtime
Tool count in minimal profile 4
Lazy initialization before DB-backed tool false
Lazy initialization after context pack true
Mutating tool tested in read-only mode mem_snapshot_create
Mutating tool blocked true

Response diet

Text shown to the model can be kept compact while the structured payload remains available to MCP clients.

Scenario Compact text chars Balanced text chars Verbose text chars
Small project continuity 160 209 20,565
Medium agent workflow 160 209 42,793
Large repeated audit 160 209 75,965
Sub-agent handoff example 160 209 54,476

Telemetry smoke

Interpretation

These numbers are not a universal guarantee. They show reproducible behavior on public synthetic fixtures. The expected value is highest when an agent would otherwise resend repeated project context across sessions.