7.2 KiB
Domain Pitfalls: ngn-agent v1.1
Domain: Hermes Agent session workspace, memory & reporting integration Researched: 2026-06-14
Critical Pitfalls
Mistakes that cause rewrites or major issues.
Pitfall 1: Docker Container Restart Loses Cloned Repos
What goes wrong: Docker container restarts (after idle timeout or gateway restart) lose all git repos cloned inside the container. Session-init clones are gone.
Why it happens: Container persistent storage (container_persistent: true) keeps containers alive for lifetime_seconds (300s = 5min) of inactivity. After that, the container is destroyed. Any git clone inside /tmp or /root is ephemeral.
Consequences: Repos disappear mid-session. The agent loses its workspace.
Prevention: Always clone to a host-mounted volume. Add ~/Projects:/workspace/repos:rw to docker_volumes. The session-init script should check for existing clones on the mounted volume and only clone if missing.
Detection: Agent reports "directory not found" when accessing repos. Script always re-clones (slow) instead of checking .git directory existence.
Pitfall 2: Memory Provider Conflict (Multiple External Providers)
What goes wrong: Configuring two external memory providers (e.g., Hindsight + Honcho) silently fails — only the first is registered.
Why it happens: MemoryManager.add_provider() explicitly rejects a second external provider with a warning (agent/memory_manager.py:342-354).
Consequences: User thinks both are active but only one works. No error message visible outside logs.
Prevention: Set memory.provider: hindsight and nothing else. Never add a second external provider.
Detection: hermes memory list shows only one provider. Check ~/.hermes/hermes-agent/logs/ for "Rejected memory provider" warning.
Pitfall 3: Cron Job Prompt Injection via Skill Content
What goes wrong: A skill loaded by a cron job contains hidden prompt-injection payload that causes the cron LLM to take unintended actions.
Why it happens: Cron jobs load skill content at runtime via _build_job_prompt(). Skill content is scanned for injection patterns, but false negatives are possible (cron/scheduler.py:1249-1303).
Consequences: Cron job runs with auto-approved tools (cron jobs have approvals.cron_mode: deny but denial is for tool approval, not LLM output).
Prevention: Keep cron job skills simple and vetted. Use no_agent scripts for deterministic operations.
Detection: Cron output contains unexpected content. Check cron/output/<job_id>/ for anomalous responses.
Moderate Pitfalls
Pitfall 1: Hindsight Cloud API Rate Limits
What goes wrong: Hindsight API rate-limits or throttles requests, causing memory writes to silently fail (async, non-blocking in MemoryManager).
Why it happens: sync_turn() is dispatched to a background thread. Failures are logged as warnings, not surfaced to the agent or user.
Consequences: Memory loss — agent thinks it saved facts but they never persisted.
Prevention: Monitor ~/.hermes/hermes-agent/logs/ for "sync_turn failed" warnings. Consider Hindsight local mode if Cloud proves unreliable.
Detection: grep "sync_turn failed" ~/.hermes/hermes-agent/logs/*
Pitfall 2: SSH Key Exposure Inside Docker
What goes wrong: Hermes agent running inside Docker has read access to ~/.ssh/ via mounted volume.
Why it happens: The agent has file read tools. If an attacker compromises the agent (prompt injection), they could exfiltrate SSH keys.
Consequences: Private SSH keys leaked. Access to all repos the keys authorize.
Prevention:
- Mount
~/.ssh:ro(read-only, keys can't be modified by agent) - Use a deploy key (per-repo, read-only) instead of personal SSH key
- Set
ssh-add -lto verify key restrictions - Consider HTTPS + personal access token (scoped, revocable) instead of SSH Detection: Monitor Docker container network egress for unexpected outbound connections.
Pitfall 3: Shell Init Script Blocking Container Start
What goes wrong: The session-init.sh script hangs (git clone needs SSH key passphrase, network timeout, etc.), blocking the Docker shell.
Why it happens: shell_init_files runs synchronously before the shell prompt appears. A hanging script prevents the agent from starting.
Consequences: Agent gets a timeout error from the terminal backend. Session is stuck.
Prevention: Add timeout to clone operations: timeout 30 git clone .... Wrap script in (sleep 5; ...) & for async init. Add set -euo pipefail for early failure detection.
Detection: Docker exec test: docker exec <container> /bin/bash -c "echo test" to verify shell responsiveness.
Minor Pitfalls
Pitfall 1: Hindsight API Key in Git History
What goes wrong: .env containing HINDSIGHT_API_KEY gets committed to a git repo.
Why it happens: Developer accidentally stages .env files.
Prevention: /Users/bapung/.hermes/ is outside the ngn-agent repo. No risk unless .env is copied into a repo directory.
Pitfall 2: DEFAULT_REPOS Superposition
What goes wrong: Two different session-init scripts or skills try to clone the same repo simultaneously.
Why it happens: Both shell_init_files and a session-start hook try to clone.
Prevention: Use only ONE mechanism. Prefer shell_init_files as it's guaranteed to run before the agent starts.
Pitfall 3: Cron Report Delivers to Wrong Chat
What goes wrong: Daily report delivers to the wrong Telegram chat.
Why it happens: deliver: origin routes to the chat where the cron job was created. If created via CLI, origin is missing and cron falls back to the first available home channel (cron/scheduler.py:444).
Prevention: Explicitly set deliver: telegram:474440517 (the ngn-agent DM) instead of deliver: telegram or deliver: origin.
Detection: Check cron delivery errors via hermes cron list.
Phase-Specific Warnings
| Phase Topic | Likely Pitfall | Mitigation |
|---|---|---|
| Hindsight activation | Provider conflict with other external provider | Verify memory.provider is set to only hindsight |
| Docker SSH volume | Key exposure via agent | Use deploy keys, read-only mount, monitor egress |
| Session init script | Blocking clone hangs container | Add timeouts, async background mode |
| Daily report skill | Poor quality LLM summaries | Iterate skill prompt; test with hermes cron run <id> |
| Stale cleanup script | Deleting active sessions | Add dry-run mode; check last_updated carefully |
| Docker volumes | Path mismatch between host/container | Use absolute paths in docker_volumes config |
| Git clone auth | SSH key passphrase prompt | Use key without passphrase or ssh-agent forwarding |
Sources
- Hermes v0.16.0 source:
agent/memory_manager.pyline 342-354 (provider conflict) - Hermes v0.16.0 source:
agent/memory_provider.pyline 115-131 (async sync_turn, silent failure) - Hermes v0.16.0 source:
cron/scheduler.pyline 1249-1303 (prompt injection scanning) - Hermes v0.16.0 source:
cron/scheduler.pyline 444 (origin fallback for delivery) - Docker container lifecycle:
container_persistent: true+lifetime_seconds: 300in config.yaml - Existing shell init script pattern:
terminal.shell_init_files: [](currently empty)