docs: initialize ngn-agent project

This commit is contained in:
2026-06-14 01:57:46 +08:00
commit 8500887c35
5 changed files with 318 additions and 0 deletions

99
.planning/PROJECT.md Normal file
View File

@@ -0,0 +1,99 @@
# ngn-agent
## What This Is
ngn-agent is a platform engineering agent powered by Nous Research's Hermes Agent. It manages multi-project infrastructure work through isolated sessions, connects via Telegram gateway, runs commands in Docker containers with limited AWS IAM roles, and persists knowledge across sessions via Hermes' memory system. Designed for platform engineers who manage real infrastructure and need guardrails against accidental mutations.
## Core Value
The agent must NEVER mutate real infrastructure beyond what the limited IAM role permits, while being maximally useful for diagnostics, research, and automation.
## Requirements
### Validated
(None yet — ship to validate)
### Active
- [ ] **AUTH-01**: Agent authenticates via AWS Bedrock as primary LLM provider
- [ ] **AUTH-02**: Agent falls back to OpenRouter when Bedrock is unavailable
- [ ] **AUTH-03**: Agent uses limited SSO role via project-local `./.aws/` config
- [ ] **AUTH-04**: Agent runs commands inside Docker containers with host hardening
- [ ] **AUTH-05**: Hermes persistent memory stores infrastructure facts and user preferences
- [ ] **AUTH-06**: Hermes session search allows recalling past infrastructure context
- [ ] **GATE-01**: Telegram gateway allows multi-project session management
- [ ] **GATE-02**: Users interact with the agent via Telegram DMs
- [ ] **GATE-03**: Pairing-based authorization for new users
- [ ] **GATE-04**: Scheduled daily reports and stale session cleanup
- [ ] **SKIL-01**: Self-improving skills system with skills hub integration
- [ ] **SKIL-02**: Infrastructure diagnostic skills (read-only by default)
- [ ] **SKIL-03**: AWS cost/health/resource querying via read-only tools
- [ ] **SKIL-04**: Jira and Confluence integration for reporting
- [ ] **SKIL-05**: Git worktree isolation for parallel branch work
- [ ] **OPS-01**: Minimal dependencies, repeatable setup via single install
- [ ] **OPS-02**: `.env` file for credential management
- [ ] **OPS-03**: `~/.aws`: never mounted; `./.aws` with limited role mounted instead
- [ ] **OPS-04**: Dangerous command approval and hardline blocklist active
### Out of Scope
- Direct `~/.aws` mounting — use scoped `./.aws` instead
- Full `kubectl exec` / `terraform apply` access without explicit approval gates
- Non-AWS cloud providers (GCP/Azure) — defer to future
- Native mobile app — Telegram gateway is the mobile interface
- Self-hosted model serving — use Bedrock/OpenRouter
## Context
- **Base tool**: Nous Research Hermes Agent (Python/uv, MIT)
- **Runtime**: macOS (Orbstack for Docker), CLI-only install (no Desktop)
- **LLM**: AWS Bedrock (primary) via boto3 SSO auth → OpenRouter (fallback)
- **Terminal backend**: Docker with hardened security (--cap-drop ALL, no-new-privileges)
- **Memory**: Hermes persistent memory (MEMORY.md + USER.md) + FTS5 session search
- **Credentials**: `./.aws/` with limited IAM role mounted read-only into Docker; `~/.hermes/.env` for OpenRouter key
- **AWS auth**: SSO role chaining with cached session (~7 day refresh), browser login on expiry
- **Project location**: `/Users/bapung/Razer/ngn-agent`
## Constraints
- **Security**: Agent must run inside Docker with limited capabilities
- **Credential**: Only scoped AWS role — never raw admin access
- **Auth**: Bedrock uses boto3 chain (SSO); no API key for primary provider
- **Provider**: OpenRouter key in `.env` for fallback only
- **Git**: Worktree-based isolation; pushes only to feature branches
- **Gateway**: Telegram as primary messaging channel
## Key Decisions
| Decision | Rationale | Outcome |
|----------|-----------|---------|
| Hermes Agent over NanoClaw | Superior memory system (auto-learning, session search, 8 external providers) | ✓ Good |
| Bedrock primary + OpenRouter fallback | Zero additional API cost for primary (uses existing AWS SSO); OpenRouter as reliability layer | — Pending |
| Docker backend | Container isolation is the security boundary; dangerous command checks skipped | — Pending |
| Project-local `./.aws` | Prevents privileged credentials from entering container | — Pending |
| CLI-only install | Desktop not needed; minimal surface area | — Pending |
| Git worktree isolation | Prevents branch contamination across sessions | — Pending |
## Evolution
This document evolves at phase transitions and milestone boundaries.
**After each phase transition** (via `/gsd-transition`):
1. Requirements invalidated? → Move to Out of Scope with reason
2. Requirements validated? → Move to Validated with phase reference
3. New requirements emerged? → Add to Active
4. Decisions to log? → Add to Key Decisions
5. "What This Is" still accurate? → Update if drifted
**After each milestone** (via `/gsd-complete-milestone`):
1. Full review of all sections
2. Core Value check — still the right priority?
3. Audit Out of Scope — reasons still valid?
4. Update Context with current state
---
*Last updated: 2026-06-14 after initialization*

96
.planning/REQUIREMENTS.md Normal file
View File

@@ -0,0 +1,96 @@
# Requirements: ngn-agent
**Defined:** 2026-06-14
**Core Value:** The agent must NEVER mutate real infrastructure beyond what the limited IAM role permits, while being maximally useful for diagnostics, research, and automation.
## v1 Requirements
### Authentication & Provider
- [ ] **AUTH-01**: Agent authenticates via AWS Bedrock as primary LLM provider using boto3 SSO auth chain
- [ ] **AUTH-02**: Agent falls back to OpenRouter when Bedrock encounters errors (rate limits, 5xx, auth failures)
- [ ] **AUTH-03**: Project-local `./.aws/` config with limited SSO role mounted read-only into Docker container
- [ ] **AUTH-04**: SSO token refresh handled via AWS SDK cached registration (~7 day validity); browser login on expiry
- [ ] **AUTH-05**: OpenRouter API key stored in `~/.hermes/.env`
### Container & Security
- [ ] **CONT-01**: Hermes configured with Docker terminal backend
- [ ] **CONT-02**: Docker container runs with `--cap-drop ALL`, `--security-opt no-new-privileges`, PID limits
- [ ] **CONT-03**: `./.aws/` mounted into container as read-only volume
- [ ] **CONT-04**: AWS_PROFILE=limited environment variable set in container
- [ ] **CONT-05**: Hermes dangerous command approval enabled with manual or smart mode
- [ ] **CONT-06**: Hardline blocklist protects against catastrophic commands
### Memory & Knowledge
- [ ] **MEM-01**: Hermes persistent memory (MEMORY.md + USER.md) stores infrastructure facts
- [ ] **MEM-02**: Agent proactively saves environment facts and conventions
- [ ] **MEM-03**: Session search available for recalling past infrastructure context
- [ ] **MEM-04**: Git worktree isolation enabled for parallel branch work
### Gateway
- [ ] **GATE-01**: Telegram gateway configured and connected
- [ ] **GATE-02**: Pairing-based authorization for new users
- [ ] **GATE-03**: Scheduled daily reports and stale session cleanup
### Skills
- [ ] **SKIL-01**: Skills system operational with Hermes Skills Hub integration
- [ ] **SKIL-02**: Read-only infrastructure diagnostic skills operational
- [ ] **SKIL-03**: Jira and Confluence reporting via MCP tools
## v2 Requirements
### Enhanced
- **SKIL-04**: Self-improving auto-skills that detect and adapt to recurring patterns
- **SKIL-05**: Custom Hermes skills catalog for platform engineering workflows
- **GATE-04**: Microsoft Teams gateway
## Out of Scope
| Feature | Reason |
|---------|--------|
| Direct `~/.aws` mounting | Privileged credentials must never enter container |
| Non-AWS cloud providers | GCP/Azure deferred — focus on AWS first |
| Native mobile app | Telegram gateway covers mobile use case |
| Self-hosted model serving | Bedrock + OpenRouter sufficient |
| Kubernetes in-cluster deployment | Local agent with CLI access only |
## Traceability
| Requirement | Phase | Status |
|-------------|-------|--------|
| AUTH-01 | Phase 1 | Pending |
| AUTH-02 | Phase 1 | Pending |
| AUTH-03 | Phase 1 | Pending |
| AUTH-04 | Phase 1 | Pending |
| AUTH-05 | Phase 1 | Pending |
| CONT-01 | Phase 1 | Pending |
| CONT-02 | Phase 1 | Pending |
| CONT-03 | Phase 1 | Pending |
| CONT-04 | Phase 1 | Pending |
| CONT-05 | Phase 1 | Pending |
| CONT-06 | Phase 1 | Pending |
| MEM-01 | Phase 2 | Pending |
| MEM-02 | Phase 2 | Pending |
| MEM-03 | Phase 2 | Pending |
| MEM-04 | Phase 2 | Pending |
| GATE-01 | Phase 3 | Pending |
| GATE-02 | Phase 3 | Pending |
| GATE-03 | Phase 3 | Pending |
| GATE-04 | Phase 3 | Pending |
| SKIL-01 | Phase 4 | Pending |
| SKIL-02 | Phase 4 | Pending |
| SKIL-03 | Phase 4 | Pending |
**Coverage:**
- v1 requirements: 22 total
- Mapped to phases: 22
- Unmapped: 0 ✓
---
*Requirements defined: 2026-06-14*
*Last updated: 2026-06-14 after initial definition*

80
.planning/ROADMAP.md Normal file
View File

@@ -0,0 +1,80 @@
# Roadmap: ngn-agent
**Current Phase:** None yet — start with Phase 1
**Total Phases:** 4
**v1 Requirements:** 22 mapped — all covered ✓
---
### Phase 1: Hermes Install & Provider Setup
**Goal:** Hermes Agent installed, Docker backend configured with security hardening, Bedrock + OpenRouter providers configured, limited AWS role mounted, dangerous command approval active.
**Mode:** mvp
**Requirements:** AUTH-01, AUTH-02, AUTH-03, AUTH-04, AUTH-05, CONT-01, CONT-02, CONT-03, CONT-04, CONT-05, CONT-06
**Success Criteria:**
1. Hermes CLI starts and responds to a chat
2. Bedrock provider authenticates via SSO and generates a response
3. OpenRouter fallback works when Bedrock is unavailable
4. Docker container runs terminal commands with hardened flags
5. `./.aws` limited role is mounted read-only and accessible inside container
6. Dangerous command approval triggers on destructive patterns
7. `hermes doctor` passes cleanly
---
### Phase 2: Memory, Git & Session Management
**Goal:** Hermes persistent memory operational, session search working, git worktree isolation enabled, infrastructure facts auto-saved.
**Requirements:** MEM-01, MEM-02, MEM-03, MEM-04
**Success Criteria:**
1. Agent saves a fact to MEMORY.md and it persists across sessions
2. Session search finds a past conversation by keyword
3. `hermes -w` creates an isolated git worktree on a feature branch
4. Agent auto-saves environment facts without being asked
---
### Phase 3: Telegram Gateway
**Goal:** Telegram gateway operational with pairing-based authorization, scheduled tasks working.
**Requirements:** GATE-01, GATE-02, GATE-03, GATE-04
**Success Criteria:**
1. Telegram bot responds to DMs via Hermes gateway
2. New users receive pairing codes and can be approved
3. Scheduled daily report command generates a summary
4. Gateway handles multiple concurrent sessions
---
### Phase 4: Skills & Integrations
**Goal:** Skills system operational, Jira/Confluence MCP integration, read-only infra diagnostic skills.
**Requirements:** SKIL-01, SKIL-02, SKIL-03
**Success Criteria:**
1. Skills Hub browsable and installable via slash commands
2. Custom platform-engineering skill loads correctly
3. Jira ticket query returns results via MCP
4. Confluence page fetcher returns documentation content
5. Read-only AWS diagnostic skill works without mutations
---
## Phase Dependency Graph
```
Phase 1 (Install & Providers)
└── Phase 2 (Memory & Git) — needs Hermes running
└── Phase 3 (Gateway) — needs stable agent
└── Phase 4 (Skills) — needs gateway for remote skill interaction
```
All phases are sequential. No parallelization.

23
.planning/STATE.md Normal file
View File

@@ -0,0 +1,23 @@
# Project State
## Project Reference
See: .planning/PROJECT.md (updated 2026-06-14)
**Core value:** Agent must NEVER mutate real infrastructure beyond what the limited IAM role permits
**Current focus:** Phase 1 — Hermes Install & Provider Setup
## State
- **Status**: initialized
- **Current phase**: none (ready for Phase 1)
- **Last action**: Created PROJECT.md, REQUIREMENTS.md, ROADMAP.md
- **Next action**: Execute Phase 1 — install Hermes, configure providers, Docker, AWS
## Notes
- User picked Hermes Agent over NanoClaw after our research
- Docker terminal backend for isolation
- Limited AWS SSO role via project-local `./.aws/`
- Bedrock primary → OpenRouter fallback
- GSD config: yolo mode, coarse granularity, sequential execution

20
.planning/config.json Normal file
View File

@@ -0,0 +1,20 @@
{
"mode": "yolo",
"granularity": "coarse",
"parallelization": false,
"commit_docs": true,
"model_profile": "balanced",
"workflow": {
"research": true,
"plan_check": true,
"verifier": true,
"nyquist_validation": false,
"auto_advance": true
},
"plan_review": {
"source_grounding": false
},
"ship": {
"pr_body_sections": []
}
}