Files
ngn-agent/.planning/research/SUMMARY.md

226 lines
18 KiB
Markdown

# Research Summary: ngn-agent v1.1 — Session Lifecycle, Memory & Reporting
**Project:** ngn-agent v1.1
**Domain:** Platform engineering agent (Hermes Agent-based configuration/adoption)
**Researched:** 2026-06-14
**Overall confidence:** HIGH
## Executive Summary
ngn-agent v1.1 adds three features atop the existing Hermes Agent installation: DEFAULT_REPOS auto-clone into session workspaces, Hindsight long-term memory provider, and daily cron reporting with stale session lifecycle management (30d archive) plus Jira integration. All three integrate cleanly with existing v1.0 infrastructure using documented Hermes extension points — no core source changes, no greenfield work, no new infrastructure.
**The recommended approach: additive configuration layer.** Every feature maps to an existing Hermes mechanism: `shell_init_files` for repo cloning, `memory.provider: hindsight` for memory, `hermes cron create` for reporting and archiving. The only new code is two shell scripts (session-init, stale-cleanup) and one skill markdown file (daily-report). Installation time is under 2 hours total.
**Key risks and mitigations:**
1. **Docker container restart loses cloned repos** — Mitigate by cloning to a host-mounted volume (`~/Projects:/workspace/repos:rw`)
2. **Hindsight Cloud API reliability** — Monitor logs for `sync_turn failed`; have local embedded mode as fallback
3. **SSH credential exposure inside Docker** — Use read-only deploy keys scoped per repo; never mount full `~/.ssh/`
4. **Memory provider conflict** — Set `memory.provider: hindsight` only; never add a second external provider
## Key Findings
### Recommended Stack
The stack is almost entirely existing Hermes infrastructure plus three small additions. See [STACK.md](./STACK.md) for full details.
**Core additions:**
- `hindsight-client>=0.4.22`: Python client for Hindsight Cloud API (already bundled as Hermes MemoryProvider plugin; just needs `uv pip install`)
- SSH key mount (existing): Git clone auth inside Docker — `~/.ssh:/root/.ssh:ro` or deploy key per repo
- `session-init.sh`: Shell script executed at terminal start via `terminal.shell_init_files` — clones DEFAULT_REPOS into `/workspace/repos/`
- `daily-report.md` skill: Hermes skill-backed cron job — agent composes daily session summary and sends via Telegram
- `stale-cleanup.sh`: `no_agent` cron script — exports sessions inactive >30d to JSON archive, deletes from live DB
**Config changes required:**
| Config | Value |
|--------|-------|
| `memory.provider` | `hindsight` |
| `terminal.shell_init_files` | `["/usr/local/bin/session-init.sh"]` |
| `terminal.docker_volumes` | Add `~/.ssh:/root/.ssh:ro` and `~/Projects:/workspace/repos:rw` |
| `HINDSIGHT_API_KEY` | Set in `~/.hermes/.env` |
| `DEFAULT_REPOS` | Space-separated `org/repo` list in `~/.hermes/.env` |
**Alternatives considered:**
| Decision | Recommended | Alternative Rejected |
|----------|-------------|---------------------|
| Hindsight mode | **Cloud** (zero infra) | Local embedded (~200MB download, 2-4GB RAM overhead) |
| Git auth method | **SSH key mount** | SSH agent forwarding (needs host socket, less reliable) |
| Session init hook | **`shell_init_files`** | Plugin `on_session_start` hook (runs after agent starts, not guaranteed before first prompt) |
| Cron mechanism | **Hermes skill + cron** | Custom Python script (wastes existing delivery infrastructure) |
### Expected Features
See [FEATURES.md](./FEATURES.md) for complete landscape, dependencies, and prioritization.
**Must have (table stakes — P1 for v1.1):**
- **DEFAULT_REPOS auto-cloned** in every new session — Manual clone per session is the #1 UX complaint. `shell_init_files` runs before agent starts, guaranteeing repos are present.
- **Cross-session persistent memory** — Built-in MEMORY.md is 2.2k chars frozen at session start. Hindsight provides entity-aware KG with semantic recall across all sessions.
- **Daily operational report** — Invisible work erodes trust. Daily Telegram report shows what the agent did, what sessions were active.
- **Stale session cleanup** — Sessions pile up indefinitely. 30d inactivity → archive to JSON → delete from live DB.
**Should have (differentiators — P2 for v1.1):**
- **Knowledge graph memory (Hindsight)** — Entity-aware cross-session recall with LLM synthesis (`hindsight_reflect`), not just FTS5 text search
- **On-demand repo cloning** — User says "clone rai-pipeline" mid-session, agent does it without leaving the conversation
- **Jira-integrated daily report** — Report includes Jira ticket status and session→ticket correlations using existing `ngn-jira` skill
- **Zero-cost stale cleanup** — `no_agent: true` cron = deterministic script, zero LLM token cost
**Defer (v1.2+):**
- On-demand repo cloning skill (trivial once default cloning works; user can already ask manually)
- Archive restore script (JSON files are text-searchable; low urgency)
- Custom ngn-agent plugin package (only valuable if shared across a team)
**Anti-features (avoid):**
- Custom scheduler (Hermes cron already handles this)
- Custom memory provider implementation (Hindsight is production-ready and bundled)
- Persistent Docker image with pre-cloned repos (image would be large, stale quickly)
- Cloud-only hindsight mode (local embedded is managed by Hermes; Cloud adds dependency + cost)
### Architecture Approach
See [ARCHITECTURE.md](./ARCHITECTURE.md) for full component boundaries, data flows, and patterns.
All v1.1 features are an **additive plugin + script + configuration layer** around Hermes' built-in extension points. No Hermes core code is modified.
**Major components:**
1. **Hindsight Memory Provider** — Cross-session memory with knowledge graph, entity resolution, semantic recall. Communicates with Hermes agent loop (pre-turn recall, post-turn retain), local PostgreSQL, OpenRouter (LLM extraction).
2. **Repo Clone Hook** (`session-init.sh`) — On session start, clones DEFAULT_REPOS from `~/.hermes/.env` into host-mounted `/workspace/repos/`. Uses `shell_init_files` mechanism (not plugin hooks) for guaranteed execution before agent starts.
3. **Daily Report Skill** (`daily-report.md`) — Skill-backed cron job. Instructs agent to query SessionDB for recent sessions, Hindsight for cross-session facts, Jira for ticket updates. Format as Telegram-friendly summary.
4. **Session Archive Script** (`stale-cleanup.sh`) — No-agent cron script. Queries SessionDB for sessions inactive >30d, exports to JSON, deletes from live DB. Deterministic, zero LLM cost.
5. **Built-in Memory (fallback)** — Always-active fallback for critical facts via MEMORY.md/USER.md, frozen at session start.
**Four architectural patterns to follow:**
1. **Plugin Hook for Session Init**`ctx.register_hook("on_session_start", handler)` for custom initialization per session (or `shell_init_files` for guaranteed gating)
2. **Skill-Backed Cron Jobs** — Cron jobs that load a skill with structured instructions; agent produces report guided by skill context
3. **No-Agent Script for Deterministic Automation**`no_agent: true` cron jobs for data gathering, archiving, threshold checks
4. **Export-Before-Delete for Data Safety** — Before removing any data, export to archive file first; verify integrity before deleting
**Anti-patterns to avoid:**
- Monkey-patching Hermes core (overwritten by auto-updates)
- Direct `state.db` SQL queries (schema changes between releases; use SessionDB API)
- Storing credentials in workspace files (prompt injection exfiltration risk)
### Critical Pitfalls
See [PITFALLS.md](./PITFALLS.md) for all 10 pitfalls with prevention and detection.
**Top 5 critical:**
1. **Docker container restart loses cloned repos** — Container destroyed after `lifetime_seconds: 300` of inactivity. Repos cloned to ephemeral container filesystem disappear. **Prevention:** Always clone to host-mounted volume (`~/Projects:/workspace/repos:rw`). Script must check for existing `.git` directory before cloning.
2. **Memory provider conflict**`MemoryManager.add_provider()` rejects a second external provider (memory_manager.py:342-354). Setting two external providers silently fails — only first is registered. **Prevention:** Set `memory.provider: hindsight` and nothing else.
3. **Cron job prompt injection via skill content** — Cron jobs load skill content at runtime. Scanning detects patterns but false negatives are possible (cron/scheduler.py:1249-1303). **Prevention:** Keep cron skills simple and vetted. Use `no_agent` scripts for deterministic operations.
4. **SSH key exposure inside Docker** — Agent with file-read tools inside Docker has read access to mounted `~/.ssh/`. Prompt injection could exfiltrate keys. **Prevention:** Mount `~/.ssh:ro` (read-only), use deploy keys per repo, consider HTTPS + scoped token instead of SSH.
5. **Shell init script blocking container start**`shell_init_files` runs synchronously before shell prompt. Hanging git clone blocks agent startup. **Prevention:** Add `timeout 30` to clone operations, wrap in `(sleep 5; ...) &` for async init.
## Implications for Roadmap
Based on research, four phases in dependency order:
### Phase 1: Hindsight Memory Provider
**Rationale:** Independent, zero-risk, enhances every other feature. Pure configuration — no scripts, no volumes, no cron changes. Quickest win (~25 min).
**Delivers:** Cross-session persistent memory with knowledge graph, entity resolution, semantic recall via Hindsight Cloud API.
**Addresses:** Cross-session persistent memory (table stakes) + Knowledge graph memory (differentiator)
**Uses:** `hindsight-client>=0.4.22`, `memory.provider: hindsight` config, `HINDSIGHT_API_KEY` env var
**Implements:** Hindsight Memory Provider component
**Avoids:** Pitfall 2 — Memory provider conflict (set only `hindsight`, never add second external)
**Research flag:** LOW — Well-documented Hermes configuration step. Verify Hindsight Cloud API availability and free tier limits during setup.
### Phase 2: Default Repos Auto-Clone + Credential Mount
**Rationale:** Second priority — fills the biggest UX gap (repos missing every session). Requires security-sensitive credential mounting, so needs careful implementation.
**Delivers:** DEFAULT_REPOS auto-cloned into every new session workspace via `shell_init_files` script. On-demand cloning capability (basic — user asks, agent clones).
**Addresses:** Default repos auto-cloned (table stakes) + On-demand repo cloning (differentiator)
**Uses:** `terminal.shell_init_files`, `terminal.docker_volumes` (SSH mount + workspace volume), `session-init.sh` script
**Implements:** Repo Clone Hook component
**Avoids:**
- Pitfall 1 — Lost clones on container restart (mitigated by host volume mount `~/Projects:/workspace/repos:rw`)
- Pitfall 5 — Blocking init script (add `timeout 30` to git clone, consider async wrapping)
- Pitfall 4 — SSH key exposure (use deploy keys, read-only mount)
**Research flag:** MEDIUM — SSH credential mount security approach (deploy key vs token vs agent forwarding) needs final decision during planning. Test both `~/.ssh:ro` and HTTPS+token approaches.
### Phase 3: Daily Cron Report
**Rationale:** Third priority — needs active sessions to report on. Phase 1+2 ensure sessions have memory and repos, making sessions productive. Now we can report on them.
**Delivers:** Daily Telegram report at 09:00 listing active sessions, session titles, last message previews, token counts. Skill-backed agent composes the summary.
**Addresses:** Daily operational report (table stakes) + Jira integration (differentiator, stretch goal)
**Uses:** `daily-report.md` skill, `hermes cron create`, existing Telegram delivery channel, existing `ngn-jira` skill
**Implements:** Daily Report Skill component
**Avoids:**
- Pitfall 3 — Cron prompt injection (keep skill simple, vetted)
- Minor Pitfall 3 — Wrong chat delivery (set `deliver: telegram:474440517` explicitly)
**Research flag:** MEDIUM — Daily report skill prompt quality needs iteration. The skill instructs the agent what to query and how to format. Plan for at least 2-3 prompt refinements after initial deploy. Jira integration depends on `ngn-jira` skill stability.
### Phase 4: Stale Session Archive (30d)
**Rationale:** Last priority because it's destructive. Should only run after reporting is working so user can see in daily reports what sessions will be affected before archiving runs.
**Delivers:** Weekly (Sunday 06:00) archival of sessions inactive >30d. Export to JSON in `~/.hermes/archive/sessions/`, delete from live DB. Summary delivered to Telegram.
**Addresses:** Stale session cleanup (table stakes)
**Uses:** `stale-cleanup.sh` script, `hermes cron create --no-agent`, `SessionDB.export_session()` / `delete_session()`
**Implements:** Session Archive Script component
**Avoids:**
- Pitfall pattern — Export-before-delete for data safety (write JSON, verify, then delete)
- Moderate Pitfall — Deleting active sessions (check `last_updated` carefully, use dry-run mode first)
**Research flag:** LOW — Deterministic script using documented SessionDB API. Add dry-run mode flag for initial testing. Consider archive verification step.
### Phase Ordering Rationale
- **Hindsight first** (Phase 1) — Zero-risk configuration change. Enhances every subsequent phase by providing cross-session context. No code, no scripts, no volumes.
- **Default Repos second** (Phase 2) — Independent from Hindsight (no dependency), but has the security-sensitive credential mount. Early implementation allows maximum testing of credential isolation.
- **Daily Report third** (Phase 3) — Needs active sessions producing data to report on. Both Phase 1 and 2 contribute to session quality. Report can also surface Hindsight memory patterns.
- **Stale Archive fourth** (Phase 4) — Destructive operation. User should see via daily reports what will be archived before the archive runs. Install archive cron after report cron so there's visible feedback first.
### Research Flags
Phases needing deeper research during planning:
- **Phase 2 (Default Repos):** SSH credential mount strategy — deploy key vs fine-grained token vs agent forwarding vs full `~/.ssh:ro`. Tradeoffs between security and simplicity need a final decision. Also verify `shell_init_files` execution ordering guarantees.
- **Phase 3 (Daily Report):** Skill prompt design for useful LLM-generated summaries. Jira API scoping — what ticket data to include, how to correlate sessions to tickets. The Jira integration scope (basic ticket status query vs full session→ticket mapping) needs definition.
Phases with standard patterns (skip research-phase):
- **Phase 1 (Hindsight):** Pure configuration — `hermes memory setup`, pick hindsight, set env vars. Hermes docs cover this completely.
- **Phase 4 (Stale Archive):** Deterministic script using `SessionDB.export_session()` / `delete_session()` — documented API, straightforward implementation, export-before-delete pattern.
## Confidence Assessment
| Area | Confidence | Notes |
|------|------------|-------|
| Stack | HIGH | All dependencies verified against Hermes v0.16.0 source code and docs. `hindsight-client` is bundled. SSH mount is standard Docker. |
| Features | HIGH | All features map to documented Hermes extension points. No speculative functionality. Prioritization derived from actual usage patterns. |
| Architecture | HIGH | Additive layer design avoids modifying Hermes core. Every component boundary matches a documented Hermes mechanism (hooks, cron, skills, config). |
| Pitfalls | HIGH | Each pitfall is sourced from specific Hermes source lines (memory_manager.py:342, cron/scheduler.py:1249-1303, etc.). Prevention strategies are concrete and testable. |
**Overall confidence: HIGH**
### Gaps to Address
| Gap | How to Address |
|-----|----------------|
| SSH credential mount: deploy key vs token vs agent forwarding | Test all approaches during Phase 2 planning. Start with deploy keys (most secure). Document security tradeoffs. |
| Hindsight Cloud API free tier limits | Create Hindsight account, verify free tier, test with actual agent usage. Fall back to local embedded mode if Cloud is unreliable. |
| Daily report quality iteration | Ship basic report in Phase 3, then iterate prompt based on actual output. Plan 2-3 refinement cycles. |
| Jira integration scope | Define in Phase 3 planning: basic ticket status query or full session→ticket correlation? Start with basic, iterate to full. |
| Archive dry-run mode | Add `--dry-run` flag to stale-cleanup.sh for initial testing. Run manually before activating cron. |
## Sources
### Primary (HIGH confidence — Hermes v0.16.0 source code + official docs)
- `agent/memory_manager.py` lines 342-354 — Memory provider conflict logic (PITFALLS.md)
- `agent/memory_provider.py` lines 115-131 — Async sync_turn silent failure (PITFALLS.md)
- `cron/scheduler.py` lines 1249-1303 — Cron prompt injection scanning (PITFALLS.md)
- `cron/scheduler.py` line 444 — Delivery origin fallback (PITFALLS.md)
- `plugins/memory/hindsight/__init__.py` — Hindsight MemoryProvider plugin (STACK.md)
- `hermes_state.py` — SessionDB API for export/delete (ARCHITECTURE.md, FEATURES.md)
- `agent/curator.py` — Skills-only execution (FEATURES.md)
- Hermes docs: hooks.md, cron.md, session-storage.md, memory.md, memory-providers.md (ARCHITECTURE.md, FEATURES.md)
- ngn-agent `config.yaml` and `initial-plan.md` (existing v1.0 baseline)
### Secondary (MEDIUM confidence)
- Hindsight documentation at https://hindsight.vectorize.io — Cloud API details and limits (STACK.md)
- Current `~/.hermes/config.yaml` — Existing Docker volumes and cron job configuration
### Tertiary (LOW confidence — needs validation)
- SSH credential mount behavior in Docker — needs testing with actual `~/.ssh:ro` mount and git clone inside container
- Hindsight Cloud API free tier reliability at scale — needs account creation to verify
---
*Research completed: 2026-06-14*
*Ready for roadmap: yes*