# Domain Pitfalls: ngn-agent v1.1 **Domain:** Hermes Agent session workspace, memory & reporting integration **Researched:** 2026-06-14 ## Critical Pitfalls Mistakes that cause rewrites or major issues. ### Pitfall 1: Docker Container Restart Loses Cloned Repos **What goes wrong:** Docker container restarts (after idle timeout or gateway restart) lose all git repos cloned inside the container. Session-init clones are gone. **Why it happens:** Container persistent storage (`container_persistent: true`) keeps containers alive for `lifetime_seconds` (300s = 5min) of inactivity. After that, the container is destroyed. Any `git clone` inside `/tmp` or `/root` is ephemeral. **Consequences:** Repos disappear mid-session. The agent loses its workspace. **Prevention:** **Always clone to a host-mounted volume.** Add `~/Projects:/workspace/repos:rw` to `docker_volumes`. The session-init script should check for existing clones on the mounted volume and only clone if missing. **Detection:** Agent reports "directory not found" when accessing repos. Script always re-clones (slow) instead of checking `.git` directory existence. ### Pitfall 2: Memory Provider Conflict (Multiple External Providers) **What goes wrong:** Configuring two external memory providers (e.g., Hindsight + Honcho) silently fails — only the first is registered. **Why it happens:** `MemoryManager.add_provider()` explicitly rejects a second external provider with a warning (agent/memory_manager.py:342-354). **Consequences:** User thinks both are active but only one works. No error message visible outside logs. **Prevention:** Set `memory.provider: hindsight` and nothing else. Never add a second external provider. **Detection:** `hermes memory list` shows only one provider. Check `~/.hermes/hermes-agent/logs/` for "Rejected memory provider" warning. ### Pitfall 3: Cron Job Prompt Injection via Skill Content **What goes wrong:** A skill loaded by a cron job contains hidden prompt-injection payload that causes the cron LLM to take unintended actions. **Why it happens:** Cron jobs load skill content at runtime via `_build_job_prompt()`. Skill content is scanned for injection patterns, but false negatives are possible (cron/scheduler.py:1249-1303). **Consequences:** Cron job runs with auto-approved tools (cron jobs have `approvals.cron_mode: deny` but denial is for tool approval, not LLM output). **Prevention:** Keep cron job skills simple and vetted. Use `no_agent` scripts for deterministic operations. **Detection:** Cron output contains unexpected content. Check `cron/output//` for anomalous responses. ## Moderate Pitfalls ### Pitfall 1: Hindsight Cloud API Rate Limits **What goes wrong:** Hindsight API rate-limits or throttles requests, causing memory writes to silently fail (async, non-blocking in MemoryManager). **Why it happens:** `sync_turn()` is dispatched to a background thread. Failures are logged as warnings, not surfaced to the agent or user. **Consequences:** Memory loss — agent thinks it saved facts but they never persisted. **Prevention:** Monitor `~/.hermes/hermes-agent/logs/` for "sync_turn failed" warnings. Consider Hindsight local mode if Cloud proves unreliable. **Detection:** `grep "sync_turn failed" ~/.hermes/hermes-agent/logs/*` ### Pitfall 2: SSH Key Exposure Inside Docker **What goes wrong:** Hermes agent running inside Docker has read access to `~/.ssh/` via mounted volume. **Why it happens:** The agent has file read tools. If an attacker compromises the agent (prompt injection), they could exfiltrate SSH keys. **Consequences:** Private SSH keys leaked. Access to all repos the keys authorize. **Prevention:** - Mount `~/.ssh:ro` (read-only, keys can't be modified by agent) - Use a **deploy key** (per-repo, read-only) instead of personal SSH key - Set `ssh-add -l` to verify key restrictions - Consider HTTPS + personal access token (scoped, revocable) instead of SSH **Detection:** Monitor Docker container network egress for unexpected outbound connections. ### Pitfall 3: Shell Init Script Blocking Container Start **What goes wrong:** The session-init.sh script hangs (git clone needs SSH key passphrase, network timeout, etc.), blocking the Docker shell. **Why it happens:** `shell_init_files` runs synchronously before the shell prompt appears. A hanging script prevents the agent from starting. **Consequences:** Agent gets a timeout error from the terminal backend. Session is stuck. **Prevention:** Add timeout to clone operations: `timeout 30 git clone ...`. Wrap script in `(sleep 5; ...) &` for async init. Add `set -euo pipefail` for early failure detection. **Detection:** Docker exec test: `docker exec /bin/bash -c "echo test"` to verify shell responsiveness. ## Minor Pitfalls ### Pitfall 1: Hindsight API Key in Git History **What goes wrong:** `.env` containing `HINDSIGHT_API_KEY` gets committed to a git repo. **Why it happens:** Developer accidentally stages `.env` files. **Prevention:** `/Users/bapung/.hermes/` is outside the ngn-agent repo. No risk unless `.env` is copied into a repo directory. ### Pitfall 2: DEFAULT_REPOS Superposition **What goes wrong:** Two different session-init scripts or skills try to clone the same repo simultaneously. **Why it happens:** Both `shell_init_files` and a session-start hook try to clone. **Prevention:** Use only ONE mechanism. Prefer `shell_init_files` as it's guaranteed to run before the agent starts. ### Pitfall 3: Cron Report Delivers to Wrong Chat **What goes wrong:** Daily report delivers to the wrong Telegram chat. **Why it happens:** `deliver: origin` routes to the chat where the cron job was created. If created via CLI, `origin` is missing and cron falls back to the first available home channel (cron/scheduler.py:444). **Prevention:** Explicitly set `deliver: telegram:474440517` (the ngn-agent DM) instead of `deliver: telegram` or `deliver: origin`. **Detection:** Check cron delivery errors via `hermes cron list`. ## Phase-Specific Warnings | Phase Topic | Likely Pitfall | Mitigation | |-------------|---------------|------------| | Hindsight activation | Provider conflict with other external provider | Verify `memory.provider` is set to only `hindsight` | | Docker SSH volume | Key exposure via agent | Use deploy keys, read-only mount, monitor egress | | Session init script | Blocking clone hangs container | Add timeouts, async background mode | | Daily report skill | Poor quality LLM summaries | Iterate skill prompt; test with `hermes cron run ` | | Stale cleanup script | Deleting active sessions | Add dry-run mode; check `last_updated` carefully | | Docker volumes | Path mismatch between host/container | Use absolute paths in `docker_volumes` config | | Git clone auth | SSH key passphrase prompt | Use key without passphrase or `ssh-agent` forwarding | ## Sources - Hermes v0.16.0 source: `agent/memory_manager.py` line 342-354 (provider conflict) - Hermes v0.16.0 source: `agent/memory_provider.py` line 115-131 (async sync_turn, silent failure) - Hermes v0.16.0 source: `cron/scheduler.py` line 1249-1303 (prompt injection scanning) - Hermes v0.16.0 source: `cron/scheduler.py` line 444 (origin fallback for delivery) - Docker container lifecycle: `container_persistent: true` + `lifetime_seconds: 300` in config.yaml - Existing shell init script pattern: `terminal.shell_init_files: []` (currently empty)