Files
ngn-agent/.planning/research/SUMMARY.md

18 KiB

Research Summary: ngn-agent v1.1 — Session Lifecycle, Memory & Reporting

Project: ngn-agent v1.1 Domain: Platform engineering agent (Hermes Agent-based configuration/adoption) Researched: 2026-06-14 Overall confidence: HIGH

Executive Summary

ngn-agent v1.1 adds three features atop the existing Hermes Agent installation: DEFAULT_REPOS auto-clone into session workspaces, Hindsight long-term memory provider, and daily cron reporting with stale session lifecycle management (30d archive) plus Jira integration. All three integrate cleanly with existing v1.0 infrastructure using documented Hermes extension points — no core source changes, no greenfield work, no new infrastructure.

The recommended approach: additive configuration layer. Every feature maps to an existing Hermes mechanism: shell_init_files for repo cloning, memory.provider: hindsight for memory, hermes cron create for reporting and archiving. The only new code is two shell scripts (session-init, stale-cleanup) and one skill markdown file (daily-report). Installation time is under 2 hours total.

Key risks and mitigations:

  1. Docker container restart loses cloned repos — Mitigate by cloning to a host-mounted volume (~/Projects:/workspace/repos:rw)
  2. Hindsight Cloud API reliability — Monitor logs for sync_turn failed; have local embedded mode as fallback
  3. SSH credential exposure inside Docker — Use read-only deploy keys scoped per repo; never mount full ~/.ssh/
  4. Memory provider conflict — Set memory.provider: hindsight only; never add a second external provider

Key Findings

The stack is almost entirely existing Hermes infrastructure plus three small additions. See STACK.md for full details.

Core additions:

  • hindsight-client>=0.4.22: Python client for Hindsight Cloud API (already bundled as Hermes MemoryProvider plugin; just needs uv pip install)
  • SSH key mount (existing): Git clone auth inside Docker — ~/.ssh:/root/.ssh:ro or deploy key per repo
  • session-init.sh: Shell script executed at terminal start via terminal.shell_init_files — clones DEFAULT_REPOS into /workspace/repos/
  • daily-report.md skill: Hermes skill-backed cron job — agent composes daily session summary and sends via Telegram
  • stale-cleanup.sh: no_agent cron script — exports sessions inactive >30d to JSON archive, deletes from live DB

Config changes required:

Config Value
memory.provider hindsight
terminal.shell_init_files ["/usr/local/bin/session-init.sh"]
terminal.docker_volumes Add ~/.ssh:/root/.ssh:ro and ~/Projects:/workspace/repos:rw
HINDSIGHT_API_KEY Set in ~/.hermes/.env
DEFAULT_REPOS Space-separated org/repo list in ~/.hermes/.env

Alternatives considered:

Decision Recommended Alternative Rejected
Hindsight mode Cloud (zero infra) Local embedded (~200MB download, 2-4GB RAM overhead)
Git auth method SSH key mount SSH agent forwarding (needs host socket, less reliable)
Session init hook shell_init_files Plugin on_session_start hook (runs after agent starts, not guaranteed before first prompt)
Cron mechanism Hermes skill + cron Custom Python script (wastes existing delivery infrastructure)

Expected Features

See FEATURES.md for complete landscape, dependencies, and prioritization.

Must have (table stakes — P1 for v1.1):

  • DEFAULT_REPOS auto-cloned in every new session — Manual clone per session is the #1 UX complaint. shell_init_files runs before agent starts, guaranteeing repos are present.
  • Cross-session persistent memory — Built-in MEMORY.md is 2.2k chars frozen at session start. Hindsight provides entity-aware KG with semantic recall across all sessions.
  • Daily operational report — Invisible work erodes trust. Daily Telegram report shows what the agent did, what sessions were active.
  • Stale session cleanup — Sessions pile up indefinitely. 30d inactivity → archive to JSON → delete from live DB.

Should have (differentiators — P2 for v1.1):

  • Knowledge graph memory (Hindsight) — Entity-aware cross-session recall with LLM synthesis (hindsight_reflect), not just FTS5 text search
  • On-demand repo cloning — User says "clone rai-pipeline" mid-session, agent does it without leaving the conversation
  • Jira-integrated daily report — Report includes Jira ticket status and session→ticket correlations using existing ngn-jira skill
  • Zero-cost stale cleanupno_agent: true cron = deterministic script, zero LLM token cost

Defer (v1.2+):

  • On-demand repo cloning skill (trivial once default cloning works; user can already ask manually)
  • Archive restore script (JSON files are text-searchable; low urgency)
  • Custom ngn-agent plugin package (only valuable if shared across a team)

Anti-features (avoid):

  • Custom scheduler (Hermes cron already handles this)
  • Custom memory provider implementation (Hindsight is production-ready and bundled)
  • Persistent Docker image with pre-cloned repos (image would be large, stale quickly)
  • Cloud-only hindsight mode (local embedded is managed by Hermes; Cloud adds dependency + cost)

Architecture Approach

See ARCHITECTURE.md for full component boundaries, data flows, and patterns.

All v1.1 features are an additive plugin + script + configuration layer around Hermes' built-in extension points. No Hermes core code is modified.

Major components:

  1. Hindsight Memory Provider — Cross-session memory with knowledge graph, entity resolution, semantic recall. Communicates with Hermes agent loop (pre-turn recall, post-turn retain), local PostgreSQL, OpenRouter (LLM extraction).
  2. Repo Clone Hook (session-init.sh) — On session start, clones DEFAULT_REPOS from ~/.hermes/.env into host-mounted /workspace/repos/. Uses shell_init_files mechanism (not plugin hooks) for guaranteed execution before agent starts.
  3. Daily Report Skill (daily-report.md) — Skill-backed cron job. Instructs agent to query SessionDB for recent sessions, Hindsight for cross-session facts, Jira for ticket updates. Format as Telegram-friendly summary.
  4. Session Archive Script (stale-cleanup.sh) — No-agent cron script. Queries SessionDB for sessions inactive >30d, exports to JSON, deletes from live DB. Deterministic, zero LLM cost.
  5. Built-in Memory (fallback) — Always-active fallback for critical facts via MEMORY.md/USER.md, frozen at session start.

Four architectural patterns to follow:

  1. Plugin Hook for Session Initctx.register_hook("on_session_start", handler) for custom initialization per session (or shell_init_files for guaranteed gating)
  2. Skill-Backed Cron Jobs — Cron jobs that load a skill with structured instructions; agent produces report guided by skill context
  3. No-Agent Script for Deterministic Automationno_agent: true cron jobs for data gathering, archiving, threshold checks
  4. Export-Before-Delete for Data Safety — Before removing any data, export to archive file first; verify integrity before deleting

Anti-patterns to avoid:

  • Monkey-patching Hermes core (overwritten by auto-updates)
  • Direct state.db SQL queries (schema changes between releases; use SessionDB API)
  • Storing credentials in workspace files (prompt injection exfiltration risk)

Critical Pitfalls

See PITFALLS.md for all 10 pitfalls with prevention and detection.

Top 5 critical:

  1. Docker container restart loses cloned repos — Container destroyed after lifetime_seconds: 300 of inactivity. Repos cloned to ephemeral container filesystem disappear. Prevention: Always clone to host-mounted volume (~/Projects:/workspace/repos:rw). Script must check for existing .git directory before cloning.

  2. Memory provider conflictMemoryManager.add_provider() rejects a second external provider (memory_manager.py:342-354). Setting two external providers silently fails — only first is registered. Prevention: Set memory.provider: hindsight and nothing else.

  3. Cron job prompt injection via skill content — Cron jobs load skill content at runtime. Scanning detects patterns but false negatives are possible (cron/scheduler.py:1249-1303). Prevention: Keep cron skills simple and vetted. Use no_agent scripts for deterministic operations.

  4. SSH key exposure inside Docker — Agent with file-read tools inside Docker has read access to mounted ~/.ssh/. Prompt injection could exfiltrate keys. Prevention: Mount ~/.ssh:ro (read-only), use deploy keys per repo, consider HTTPS + scoped token instead of SSH.

  5. Shell init script blocking container startshell_init_files runs synchronously before shell prompt. Hanging git clone blocks agent startup. Prevention: Add timeout 30 to clone operations, wrap in (sleep 5; ...) & for async init.

Implications for Roadmap

Based on research, four phases in dependency order:

Phase 1: Hindsight Memory Provider

Rationale: Independent, zero-risk, enhances every other feature. Pure configuration — no scripts, no volumes, no cron changes. Quickest win (~25 min). Delivers: Cross-session persistent memory with knowledge graph, entity resolution, semantic recall via Hindsight Cloud API. Addresses: Cross-session persistent memory (table stakes) + Knowledge graph memory (differentiator) Uses: hindsight-client>=0.4.22, memory.provider: hindsight config, HINDSIGHT_API_KEY env var Implements: Hindsight Memory Provider component Avoids: Pitfall 2 — Memory provider conflict (set only hindsight, never add second external) Research flag: LOW — Well-documented Hermes configuration step. Verify Hindsight Cloud API availability and free tier limits during setup.

Phase 2: Default Repos Auto-Clone + Credential Mount

Rationale: Second priority — fills the biggest UX gap (repos missing every session). Requires security-sensitive credential mounting, so needs careful implementation. Delivers: DEFAULT_REPOS auto-cloned into every new session workspace via shell_init_files script. On-demand cloning capability (basic — user asks, agent clones). Addresses: Default repos auto-cloned (table stakes) + On-demand repo cloning (differentiator) Uses: terminal.shell_init_files, terminal.docker_volumes (SSH mount + workspace volume), session-init.sh script Implements: Repo Clone Hook component Avoids:

  • Pitfall 1 — Lost clones on container restart (mitigated by host volume mount ~/Projects:/workspace/repos:rw)
  • Pitfall 5 — Blocking init script (add timeout 30 to git clone, consider async wrapping)
  • Pitfall 4 — SSH key exposure (use deploy keys, read-only mount) Research flag: MEDIUM — SSH credential mount security approach (deploy key vs token vs agent forwarding) needs final decision during planning. Test both ~/.ssh:ro and HTTPS+token approaches.

Phase 3: Daily Cron Report

Rationale: Third priority — needs active sessions to report on. Phase 1+2 ensure sessions have memory and repos, making sessions productive. Now we can report on them. Delivers: Daily Telegram report at 09:00 listing active sessions, session titles, last message previews, token counts. Skill-backed agent composes the summary. Addresses: Daily operational report (table stakes) + Jira integration (differentiator, stretch goal) Uses: daily-report.md skill, hermes cron create, existing Telegram delivery channel, existing ngn-jira skill Implements: Daily Report Skill component Avoids:

  • Pitfall 3 — Cron prompt injection (keep skill simple, vetted)
  • Minor Pitfall 3 — Wrong chat delivery (set deliver: telegram:474440517 explicitly) Research flag: MEDIUM — Daily report skill prompt quality needs iteration. The skill instructs the agent what to query and how to format. Plan for at least 2-3 prompt refinements after initial deploy. Jira integration depends on ngn-jira skill stability.

Phase 4: Stale Session Archive (30d)

Rationale: Last priority because it's destructive. Should only run after reporting is working so user can see in daily reports what sessions will be affected before archiving runs. Delivers: Weekly (Sunday 06:00) archival of sessions inactive >30d. Export to JSON in ~/.hermes/archive/sessions/, delete from live DB. Summary delivered to Telegram. Addresses: Stale session cleanup (table stakes) Uses: stale-cleanup.sh script, hermes cron create --no-agent, SessionDB.export_session() / delete_session() Implements: Session Archive Script component Avoids:

  • Pitfall pattern — Export-before-delete for data safety (write JSON, verify, then delete)
  • Moderate Pitfall — Deleting active sessions (check last_updated carefully, use dry-run mode first) Research flag: LOW — Deterministic script using documented SessionDB API. Add dry-run mode flag for initial testing. Consider archive verification step.

Phase Ordering Rationale

  • Hindsight first (Phase 1) — Zero-risk configuration change. Enhances every subsequent phase by providing cross-session context. No code, no scripts, no volumes.
  • Default Repos second (Phase 2) — Independent from Hindsight (no dependency), but has the security-sensitive credential mount. Early implementation allows maximum testing of credential isolation.
  • Daily Report third (Phase 3) — Needs active sessions producing data to report on. Both Phase 1 and 2 contribute to session quality. Report can also surface Hindsight memory patterns.
  • Stale Archive fourth (Phase 4) — Destructive operation. User should see via daily reports what will be archived before the archive runs. Install archive cron after report cron so there's visible feedback first.

Research Flags

Phases needing deeper research during planning:

  • Phase 2 (Default Repos): SSH credential mount strategy — deploy key vs fine-grained token vs agent forwarding vs full ~/.ssh:ro. Tradeoffs between security and simplicity need a final decision. Also verify shell_init_files execution ordering guarantees.
  • Phase 3 (Daily Report): Skill prompt design for useful LLM-generated summaries. Jira API scoping — what ticket data to include, how to correlate sessions to tickets. The Jira integration scope (basic ticket status query vs full session→ticket mapping) needs definition.

Phases with standard patterns (skip research-phase):

  • Phase 1 (Hindsight): Pure configuration — hermes memory setup, pick hindsight, set env vars. Hermes docs cover this completely.
  • Phase 4 (Stale Archive): Deterministic script using SessionDB.export_session() / delete_session() — documented API, straightforward implementation, export-before-delete pattern.

Confidence Assessment

Area Confidence Notes
Stack HIGH All dependencies verified against Hermes v0.16.0 source code and docs. hindsight-client is bundled. SSH mount is standard Docker.
Features HIGH All features map to documented Hermes extension points. No speculative functionality. Prioritization derived from actual usage patterns.
Architecture HIGH Additive layer design avoids modifying Hermes core. Every component boundary matches a documented Hermes mechanism (hooks, cron, skills, config).
Pitfalls HIGH Each pitfall is sourced from specific Hermes source lines (memory_manager.py:342, cron/scheduler.py:1249-1303, etc.). Prevention strategies are concrete and testable.

Overall confidence: HIGH

Gaps to Address

Gap How to Address
SSH credential mount: deploy key vs token vs agent forwarding Test all approaches during Phase 2 planning. Start with deploy keys (most secure). Document security tradeoffs.
Hindsight Cloud API free tier limits Create Hindsight account, verify free tier, test with actual agent usage. Fall back to local embedded mode if Cloud is unreliable.
Daily report quality iteration Ship basic report in Phase 3, then iterate prompt based on actual output. Plan 2-3 refinement cycles.
Jira integration scope Define in Phase 3 planning: basic ticket status query or full session→ticket correlation? Start with basic, iterate to full.
Archive dry-run mode Add --dry-run flag to stale-cleanup.sh for initial testing. Run manually before activating cron.

Sources

Primary (HIGH confidence — Hermes v0.16.0 source code + official docs)

  • agent/memory_manager.py lines 342-354 — Memory provider conflict logic (PITFALLS.md)
  • agent/memory_provider.py lines 115-131 — Async sync_turn silent failure (PITFALLS.md)
  • cron/scheduler.py lines 1249-1303 — Cron prompt injection scanning (PITFALLS.md)
  • cron/scheduler.py line 444 — Delivery origin fallback (PITFALLS.md)
  • plugins/memory/hindsight/__init__.py — Hindsight MemoryProvider plugin (STACK.md)
  • hermes_state.py — SessionDB API for export/delete (ARCHITECTURE.md, FEATURES.md)
  • agent/curator.py — Skills-only execution (FEATURES.md)
  • Hermes docs: hooks.md, cron.md, session-storage.md, memory.md, memory-providers.md (ARCHITECTURE.md, FEATURES.md)
  • ngn-agent config.yaml and initial-plan.md (existing v1.0 baseline)

Secondary (MEDIUM confidence)

  • Hindsight documentation at https://hindsight.vectorize.io — Cloud API details and limits (STACK.md)
  • Current ~/.hermes/config.yaml — Existing Docker volumes and cron job configuration

Tertiary (LOW confidence — needs validation)

  • SSH credential mount behavior in Docker — needs testing with actual ~/.ssh:ro mount and git clone inside container
  • Hindsight Cloud API free tier reliability at scale — needs account creation to verify

Research completed: 2026-06-14 Ready for roadmap: yes