ngn-agent/.planning/research/ARCHITECTURE.md

# Architecture Patterns

**Domain:** Platform engineering agent (Hermes Agent configuration)
**Researched:** 2026-06-14

## Recommended Architecture

The v1.1 features are additive — they extend an existing Hermes Agent deployment without modifying Hermes core. The architecture is a **plugin + script + configuration** layer around Hermes' built-in extension points.

```
                       ┌─────────────────────────────────────────┐
                       │            Telegram Gateway              │
                       │  (already active, TELEGRAM_HOME_CHANNEL) │
                       └────────────┬────────────────────────────┘
                                    │
                       ┌────────────▼────────────────────────────┐
                       │         Hermes Agent Runtime             │
                       │  ┌────────────────────────────────────┐  │
                       │  │        Hindsight Memory Provider   │  │
                       │  │  (local embedded PostgreSQL daemon)│  │
                       │  │  auto_retain: true                 │  │
                       │  │  auto_recall: true                 │  │
                       │  │  memory_mode: hybrid               │  │
                       │  └────────────────────────────────────┘  │
                       │  ┌────────────────────────────────────┐  │
                       │  │       Built-in Memory (fallback)    │  │
                       │  │       MEMORY.md + USER.md          │  │
                       │  └────────────────────────────────────┘  │
                       │  ┌────────────────────────────────────┐  │
                       │  │       Plugin Hook System            │  │
                       │  │  on_session_start → repo cloning   │  │
                       │  │  pre_llm_call → context injection  │  │
                       │  └────────────────────────────────────┘  │
                       └──────────────────────────────────────────┘
                                    │
              ┌─────────────────────┼─────────────────────┐
              │                     │                     │
    ┌─────────▼─────────┐ ┌────────▼────────┐ ┌─────────▼─────────┐
    │  Docker Terminal   │ │  Cron Jobs      │ │  Session Storage  │
    │  (repo workspace)  │ │  (reporting,    │ │  (state.db)       │
    │  DEFAULT_REPOS     │ │   archiving)    │ │                   │
    │  cloned via hook   │ │                 │ │  export_session() │
    └───────────────────┘ └─────────────────┘ │  delete_session()  │
                                              └───────────────────┘
                                                       │
                                              ┌────────▼─────────┐
                                              │  Archive Storage  │
                                              │  ~/.hermes/      │
                                              │  archive/sessions/│
                                              └──────────────────┘
```

### Component Boundaries

| Component | Responsibility | Communicates With |
|-----------|---------------|-------------------|
| **Hindsight Provider** | Cross-session memory with knowledge graph, entity resolution, semantic recall | Hermes agent loop (pre-turn recall, post-turn retain), local PostgreSQL, OpenRouter (LLM extraction) |
| **Repo Clone Hook** | On session start, clones DEFAULT_REPOS into workspace | Docker terminal (via `terminal` tool or subprocess), filesystem |
| **Daily Report Skill** | Instructs agent what data to gather and how to format the daily summary | SessionDB (via `state.db` queries or `session_search`), Telegram (via `send_message`), Jira API (via ngn-jira skill) |
| **Session Archive Script** | Exports stale sessions to JSON, deletes from live DB | SessionDB API (`export_session`, `delete_session`), archive filesystem |
| **Built-in Memory** | Always-active fallback for critical facts | Agent system prompt (frozen at session start) |

### Data Flow

#### Session Start (Default Repos)
```
User sends first message
  → `on_session_start` hook fires
  → Repo clone plugin checks /workspace/
  → Missing repos cloned via git (needs credential mount)
  → `pre_llm_call` hook fires (is_first_turn=True)
  → Plugin injects "Cloned repos: rai-ops, rai-deployment, rai-devtools" as context
  → Agent sees repos available in workspace
```

#### Memory Flow (Hindsight)
```
Agent turn completes
  → Built-in memory save (MEMORY.md / USER.md)
  → Hindsight auto_retain: conversation turn + entity extraction
  → Stored in local PostgreSQL with knowledge graph

Next turn (any session)
  → Hindsight auto_recall: semantic search for relevant memories
  → Results injected as context into the turn
  → Agent sees recalled facts from any past session
```

#### Daily Report Flow
```
Cron tick at 09:00
  → Scheduler loads daily-report skill
  → Creates fresh AIAgent session
  → Skill prompt instructs agent to:
      1. Query state.db for recent sessions
      2. Query hindsight for relevant cross-session facts
      3. Query Jira for open/updated tickets
      4. Format as Telegram-friendly summary
  → Agent produces report
  → Delivered to TELEGRAM_HOME_CHANNEL
```

#### Session Archive Flow
```
Cron tick on Sunday 06:00
  → No-agent script runs
  → Queries state.db for sessions inactive >30d
  → For each: export_session() → write JSON → delete_session()
  → Summary of archived sessions delivered to Telegram
```

## Patterns to Follow

### Pattern 1: Plugin Hook for Session Initialization
**What:** Use Hermes' plugin hook system (`ctx.register_hook("on_session_start", handler)`) to run initialization logic when a new session begins.
**When:** Any setup that should happen exactly once per session, before the agent processes any user message.
**Example:**
```python
def clone_default_repos(session_id, model, platform, **kwargs):
    repos = ["rai-ops", "rai-deployment", "rai-devtools"]
    for repo in repos:
        path = f"/workspace/{repo}"
        if not os.path.exists(path):
            subprocess.run(["git", "clone", f"github.com/rai-apps/{repo}", path])

def register(ctx):
    ctx.register_hook("on_session_start", clone_default_repos)
```

### Pattern 2: Skill-Backed Cron Jobs
**What:** Cron jobs that load a skill before executing. The skill provides structured instructions; the cron prompt is the task.
**When:** Recurring tasks that benefit from agent reasoning but follow a repeatable structure.
**Example:**
```bash
hermes cron create "0 9 * * *" \
  --skill daily-report \
  --deliver telegram:-100474440517 \
  --name "Daily Platform Report"
```
The skill (`daily-report/SKILL.md`) contains the report template. The cron job's prompt is just "Generate today's report."

### Pattern 3: No-Agent Script for Deterministic Automation
**What:** Cron jobs with `no_agent=True` that run a script directly, skipping the LLM entirely.
**When:** Tasks where the output is fully determined by script logic — archiving, data gathering, threshold checks.
**Example:**
```bash
hermes cron create "0 6 * * 0" \
  --no-agent \
  --script archive-stale-sessions.py \
  --deliver telegram:-100474440517 \
  --name "Weekly Session Archive"
```

### Pattern 4: Export-Before-Delete for Data Safety
**What:** Before removing any data from the live system, export it to an archive file first.
**When:** Any destructive operation on session data, files, or state.
**Example:**
```python
data = db.export_session(session_id)
archive_path = archive_dir / f"{session_id}.json"
archive_path.write_text(json.dumps(data, indent=2))
db.delete_session(session_id)
```

## Anti-Patterns to Avoid

### Anti-Pattern 1: Monkey-Patching Hermes Core
**What:** Modifying `~/.hermes/hermes-agent/` source files to add custom behavior.
**Why bad:** Hermes updates overwrite changes. The agent auto-updates. Custom patches break silently and are unrecoverable.
**Instead:** Use documented extension points: plugin hooks, shell hooks, skills, cron jobs.

### Anti-Pattern 2: Direct `state.db` Schema Queries in Production Scripts
**What:** Writing SQL queries against `~/.hermes/state.db` that depend on internal schema details.
**Why bad:** Schema changes between releases without notice (currently v11, has gone through 11 migrations). Queries break after `hermes update`.
**Instead:** Use `SessionDB` API methods (`export_session()`, `create_session()`, `get_messages()`). Fall back to direct SQL only in controlled scripts that are tested after each Hermes update.

### Anti-Pattern 3: Storing Credentials in Workspace Files
**What:** Writing GitHub tokens or SSH keys into the Docker container's workspace.
**Why bad:** If the agent is compromised (prompt injection), credentials in workspace files can be exfiltrated via `read_file` or `terminal` output.
**Instead:** Mount credentials read-only at the Docker level (`docker_volumes: [path:path:ro]`). Use `docker_forward_env` for environment variable-based credentials.

## Scalability Considerations

| Concern | At 1 user | At 10 users (future team) | Notes |
|---------|-----------|---------------------------|-------|
| Hindsight DB | <1GB PostgreSQL | 5-50GB PostgreSQL | Local embedded mode is single-user. For teams, switch to cloud mode or self-hosted Hindsight. |
| Session archive | ~100 sessions/year | ~1,000 sessions/year | JSON files are tiny (~50KB each). Storage is negligible. |
| Cron report LLM cost | 1 report/day ~1K tokens | 10 reports/day ~10K tokens | Cost scales linearly with users. Consider no-agent mode for data sections. |
| Repo clones | 3 repos per session | Same (shared workspaces) | Container persistence means clones survive across sessions in the same container. |

## Sources

- Hermes Agent docs: Hook system (`website/docs/user-guide/features/hooks.md`)
- Hermes Agent docs: Cron system (`website/docs/user-guide/features/cron.md`)
- Hermes Agent docs: Session storage (`website/docs/developer-guide/session-storage.md`)
- Hermes Agent source: `hermes_state.py`, `agent/curator.py`, `hermes_cli/hooks.py`
- ngn-agent `config.yaml` and `initial-plan.md`