Files
ngn-agent/.planning/phases/09-tooling-portable-setup/09-RESEARCH.md

617 lines
35 KiB
Markdown

# Phase 9: Tooling & Portable Setup — Research
**Researched:** 2026-06-15
**Domain:** Docker image build, portable bash setup script, Hermes config management
**Confidence:** HIGH (verified via official sources for all tool versions and installation methods)
## Summary
Phase 9 has two independent workstreams: (1) a custom Dockerfile extending `nikolaik/python-nodejs` with AWS CLI v2, Terraform, Helm, kubectl, and Datadog CLI (pup), and (2) a portable bash setup script that recreates all ngn-agent configuration on a fresh macOS machine. Both are new files living in the project repo — the Dockerfile at `docker/Dockerfile` with a build script at `docker/build.sh`, and the setup script at `setup-ngn-agent.sh`.
All five tools for the Docker image are installable on the Debian-based base image. AWS CLI v2 uses its official curl→unzip→install script (no apt repo for v2). Terraform and kubectl have official apt repos. Helm has a community-maintained Buildkite apt repo. Pup (Datadog CLI) is a Rust binary downloaded from GitHub releases. The base image tag `python3.11-nodejs20` **may no longer be available** — the maintainer has dropped Python 3.11 + Node.js 20 tags; the smallest Node.js version for Python 3.11 is now `python3.11-nodejs22`. The setup script should use `hermes config set` for individual key paths where possible, with `sed` as fallback for complex YAML structures.
**Primary recommendation:** Two plans — (1) Dockerfile + build script with pinned tool versions, (2) portable setup script with interactive prompt flow for secrets.
## Phase Requirements
| ID | Description | Research Support |
|----|-------------|------------------|
| TOOL-01 | Custom Hermes Docker image with aws-cli, terraform, helm, kubectl, datadog CLI | All five tools have documented installation methods for Debian-based images. Versions verified from official sources. |
| SETUP-01 | Portable setup-ngn-agent.sh script recreating all config | Current config.yaml (565 lines), .env (484 lines), hindsight/config.json, 2 scripts, 5 skills, and 3 cron jobs all identified. |
## User Constraints (from CONTEXT.md)
<user_constraints>
### Locked Decisions
#### Custom Docker Image
- **D-01:** Dockerfile lives in this repo at `ngn-agent/docker/Dockerfile` — extends `nikolaik/python-nodejs:python3.11-nodejs20`
- **D-02:** Pin specific tool versions — Dockerfile should specify exact versions for reproducibility
- **D-03:** Tools to include:
- **aws-cli**: v2 (latest stable)
- **terraform**: latest stable
- **helm**: latest stable
- **kubectl**: latest stable matching cluster version
- **datadog CLI** (`pup`): latest stable
- **D-04:** Build script at `ngn-agent/docker/build.sh` — single command to build the image
- **D-05:** Image tag: `ngn-agent:latest` (local only, no registry push)
#### Portable Setup Script
- **D-06:** Single script at `ngn-agent/setup-ngn-agent.sh` — recreates all configuration on a fresh machine
- **D-07:** Assumes Hermes v0.16+ is already installed and `hermes` CLI is on PATH
- **D-08:** Interactive prompts for all secrets:
- `JIRA_API_TOKEN` (required for Atlassian integrations)
- `JIRA_EMAIL` (required for Atlassian integrations)
- `TELEGRAM_BOT_TOKEN` (required for gateway)
- `OPENROUTER_API_KEY` (if not already set)
- **D-09:** Configurable parameters (supplied via args or prompts):
- SSH key paths (default: `~/.ssh/id_ed25519razer`, `~/.ssh/id_rsa`)
- SSH config path (default: `~/.ssh/config`)
- SSH known_hosts path (default: `~/.ssh/known_hosts`)
- Repo paths (default: `~/Razer/rai-ops`, `~/Razer/rai-deployment`, `~/Razer/rai-devtools`)
- Timezone (default: `Asia/Singapore`)
- **D-10:** What the setup script creates/updates:
- `~/.hermes/config.yaml` — docker_volumes (SSH + repo mounts), shell_init_files, docker_forward_env, cron config
- `~/.hermes/.env` — secrets and DEFAULT_REPOS
- `~/.hermes/hindsight/config.json` — Hindsight config
- `~/.hermes/scripts/session-init.sh` — mount verification script
- `~/.hermes/scripts/archive-stale-sessions.sh` — archive script
- `~/.hermes/skills/ngn-agent/` — all 5 skill directories
- `~/.hermes/archive/sessions/` — archive directory
- Register 3 cron jobs (ngn-daily-report, ngn-weekly-stale-summary, ngn-weekly-archive)
- Update Docker image reference in config.yaml
### the agent's Discretion
- **Dockerfile tool version selection**: Choose stable versions current at time of implementation
- **Setup script structure**: Interactive prompt flow, output formatting, error handling approach
- **Config file templates**: How to generate config.yaml sections, .env format, etc.
### Deferred Ideas (OUT OF SCOPE)
- Multi-architecture image builds (arm64 + amd64) — defer until needed
- Cloud-native deployment (Docker Compose, Fly.io, etc.) — out of scope
- CI/CD for image builds — out of scope
</user_constraints>
## Architectural Responsibility Map
| Capability | Primary Tier | Secondary Tier | Rationale |
|------------|-------------|----------------|-----------|
| Docker image build | Developer Machine (macOS) | — | Builds locally, `docker build` runs on host |
| Tool installation in Dockerfile | Docker Image Build (CI/macOS) | — | Each `RUN` layer installs tool; version-pinned for reproducibility |
| Interactive secret prompts | Setup Script (macOS CLI) | — | `read -s` reads secrets from stdin, writes to ~/.hermes/.env |
| Config file generation | Setup Script (macOS CLI) | — | Creates/modifies ~/.hermes/config.yaml, .env, config.json |
| Cron job registration | Setup Script → Hermes CLI | — | Uses `hermes cron create` CLI commands |
| Skill file copying | Setup Script (macOS CLI) | — | Copies SKILL.md files from embedded base64 or repo |
## Standard Stack
### Core
| Tool | Method | Purpose | Installation Source |
|------|--------|---------|-------------------|
| Base Image | `FROM nikolaik/python-nodejs:python3.11-nodejs22` | Hermes-compatible Python + Node.js runtime | Docker Hub (official nikolaik image) |
| AWS CLI v2 | curl → unzip → `./aws/install` | AWS diagnostics via CLI | [Official docs](https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html) |
| Terraform | HashiCorp apt repo → `apt install terraform` | Infrastructure-as-code management | [HashiCorp Developer](https://developer.hashicorp.com/terraform/install#linux) |
| Helm | Buildkite apt repo → `apt install helm` | Kubernetes package management | [Helm docs](https://helm.sh/docs/intro/install/#from-apt-debianubuntu) |
| kubectl | Google Kubernetes apt repo → `apt install kubectl` | Kubernetes cluster management | [Kubernetes docs](https://kubernetes.io/docs/tasks/tools/install-kubectl-linux/) |
| Datadog CLI (pup) | curl binary from GitHub Releases | Datadog observability via CLI | [DataDog/pup releases](https://github.com/DataDog/pup/releases) |
### Pinned Versions (current as of 2026-06-15)
| Tool | Version | Source | Confidence |
|------|---------|--------|------------|
| AWS CLI v2 | 2.27.41 | [CITED: docs.aws.amazon.com/cli/](https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html) — version shown in output example | HIGH |
| Terraform | 1.15.6 | [CITED: developer.hashicorp.com/terraform/install](https://developer.hashicorp.com/terraform/install) | HIGH |
| Helm | 4.2.1 | [CITED: helm.sh/docs/intro/install](https://helm.sh/docs/intro/install/) + [github.com/helm/helm/releases/tag/v4.2.1](https://github.com/helm/helm/releases/tag/v4.2.1) | HIGH |
| kubectl | 1.36.1 | [CITED: kubernetes.io/releases](https://kubernetes.io/releases/) — latest stable | HIGH |
| Datadog CLI (pup) | 1.1.0 | [CITED: github.com/DataDog/pup/releases/tag/v1.1.0](https://github.com/DataDog/pup/releases/tag/v1.1.0) | HIGH |
### Base Image Tag Issue
**⚠ D-01 specifies `nikolaik/python-nodejs:python3.11-nodejs20` but this tag may no longer be available.** [VERIFIED: hub.docker.com/r/nikolaik/python-nodejs] The current tag table shows Python 3.11 is only available with Node.js 26, 24, or 22. The `nodejs20` tags were likely removed when Node.js 20 reached end of life (April 2026).
Available Python 3.11 tags (as of 2026-06-15):
- `python3.11-nodejs26` (Node.js 26.3.0, Debian trixie) — latest
- `python3.11-nodejs26-bookworm` (Node.js 26.3.0, Debian bookworm)
- `python3.11-nodejs26-slim`
- `python3.11-nodejs24` (Node.js 24.16.0, Debian trixie)
- `python3.11-nodejs24-bookworm` (Node.js 24.16.0, Debian bookworm)
- `python3.11-nodejs22` (Node.js 22.22.3, Debian trixie)
- `python3.11-nodejs22-bookworm` (Node.js 22.22.3, Debian bookworm)
**Recommendation:** Use `python3.11-nodejs22-bookworm` as a close match (Node.js 22 is still under LTS until Apr 2027). This needs user confirmation — flag for discuss-phase.
### Setup Script Standard Stack
| Library/Tool | Usage | Why |
|-------------|-------|-----|
| `bash` (built-in) | Script host | Zero dependencies, available on every macOS machine |
| `read -s` | Secret input | Masked input for passwords/tokens |
| `hermes config set` | Config YAML modification | Prefer over raw sed for individual key paths |
| `sed -i` | Config YAML fallback | For complex multi-line YAML blocks (e.g., docker_volumes array) |
| `crontab` | Cron job fallback | If `hermes cron create` is not available |
| `base64 -d` | Embedded file extraction | Embeds skill files and scripts as base64 in setup script |
## Package Legitimacy Audit
> No external packages from package registries (npm/PyPI/crates) are installed in this phase. All tools are installed via OS-level package managers (apt, curl binary downloads) or built into the Docker image. No `npm install`, `pip install`, or `cargo install` is needed.
| Package | Registry | Verdict | Disposition |
|---------|----------|---------|-------------|
| (none) | — | — | No packages to audit |
## Architecture Patterns
### System Architecture Diagram
```
┌─────────────────────────────────────────────────────────────┐
│ ngn-agent Project Repo │
│ │
│ docker/ setup-ngn-agent.sh │
│ ├── Dockerfile (portable setup script) │
│ └── build.sh │
│ (builds image) │
└─────────┬───────────────────────────────────────┬───────────┘
│ │
▼ ▼
┌─────────────────────┐ ┌─────────────────────────────────┐
│ Custom Docker Image │ │ Fresh macOS Machine │
│ (tag: ngn-agent) │ │ │
│ │ │ ├─ Prerequisite checks │
│ Base: nikolaik/ │ │ │ (Hermes installed? │
│ python-nodejs │ │ │ Docker running? │
│ │ │ │ SSH keys exist?) │
│ Installed tools: │ │ ├─ Interactive secret prompts │
│ ├─ aws-cli 2.27.41 │ │ │ (JIRA_API_TOKEN, etc.) │
│ ├─ terraform 1.15.6│ │ ├─ Config file generation │
│ ├─ helm 4.2.1 │ │ │ (config.yaml, .env, etc.) │
│ ├─ kubectl 1.36.1 │ │ ├─ Script/skill copying │
│ └─ pup 1.1.0 │ │ ├─ Cron registration │
│ │ │ └─ Gateway restart offer │
└─────────────────────┘ └─────────────────────────────────┘
```
### Recommended Project Structure
```
ngn-agent/
├── docker/
│ ├── Dockerfile # Custom Hermes image with added tools
│ └── build.sh # Single-command build script
├── setup-ngn-agent.sh # Portable setup script (standalone)
```
**Note:** The setup script embeds skill files and scripts as base64-encoded here-documents so it's fully self-contained. No external file dependencies needed.
### Pattern 1: Multi-Tool Dockerfile with Version Pinning
**What:** A Dockerfile that installs 5 platform engineering tools on top of the base Python+Node.js image, using version-pinned installations for reproducibility.
**When to use:** Any time a custom Hermes Docker image is built with additional CLI tools.
**Example:**
```dockerfile
# Source: [CITED: developer.hashicorp.com/terraform/install] + [CITED: docs.aws.amazon.com/cli/]
# Use ARGs for version pinning
ARG TERRAFORM_VERSION=1.15.6
ARG HELM_VERSION=4.2.1
ARG KUBECTL_VERSION=1.36.1
ARG PUPP_VERSION=1.1.0
# Install system dependencies
RUN apt-get update && apt-get install -y --no-install-recommends \
curl \
ca-certificates \
unzip \
gnupg \
&& rm -rf /var/lib/apt/lists/*
# Install AWS CLI v2 (no apt repo for v2 — use official installer)
RUN curl -fsSL "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip" \
&& unzip -q awscliv2.zip \
&& ./aws/install --bin-dir /usr/local/bin --install-dir /usr/local/aws-cli \
&& rm -rf awscliv2.zip aws/
# Install Terraform via HashiCorp apt repo
RUN wget -O- https://apt.releases.hashicorp.com/gpg | gpg --dearmor -o /usr/share/keyrings/hashicorp-archive-keyring.gpg \
&& echo "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/hashicorp-archive-keyring.gpg] https://apt.releases.hashicorp.com $(grep -oP '(?<=UBUNTU_CODENAME=).*' /etc/os-release || lsb_release -cs) main" \
| tee /etc/apt/sources.list.d/hashicorp.list \
&& apt-get update && apt-get install -y terraform=${TERRAFORM_VERSION} \
&& rm -rf /var/lib/apt/lists/*
# Install kubectl via Google Kubernetes apt repo
RUN curl -fsSL https://pkgs.k8s.io/core:/stable:/v1.36/deb/Release.key | gpg --dearmor -o /usr/share/keyrings/kubernetes-apt-keyring.gpg \
&& echo 'deb [signed-by=/usr/share/keyrings/kubernetes-apt-keyring.gpg] https://pkgs.k8s.io/core:/stable:/v1.36/deb/ /' \
| tee /etc/apt/sources.list.d/kubernetes.list \
&& apt-get update && apt-get install -y kubectl=${KUBECTL_VERSION}-* \
&& rm -rf /var/lib/apt/lists/*
# Install Helm via Buildkite apt repo
RUN curl -fsSL https://packages.buildkite.com/helm-linux/helm-debian/gpgkey | gpg --dearmor -o /usr/share/keyrings/helm.gpg \
&& echo "deb [signed-by=/usr/share/keyrings/helm.gpg] https://packages.buildkite.com/helm-linux/helm-debian/any/ any main" \
| tee /etc/apt/sources.list.d/helm-stable-debian.list \
&& apt-get update && apt-get install -y helm=${HELM_VERSION} \
&& rm -rf /var/lib/apt/lists/*
# Install Datadog CLI (pup) — Rust binary from GitHub releases
RUN curl -fsSL "https://github.com/DataDog/pup/releases/download/v${PUPP_VERSION}/pup_${PUPP_VERSION}_Linux_x86_64.tar.gz" \
-o /tmp/pup.tar.gz \
&& tar xzf /tmp/pup.tar.gz -C /usr/local/bin/ pup \
&& rm -f /tmp/pup.tar.gz
# Verify all tools
RUN aws --version && terraform --version && helm version && kubectl version --client && pup --version
```
### Pattern 2: Interactive Secret Prompt with Validation
**What:** A bash function that prompts for a secret with masked input, validates it's non-empty, and offers to retry if empty.
**When to use:** Any bash setup script that needs to collect API tokens or passwords interactively.
**Example:**
```bash
# Source: Standard bash pattern — no single authoritative source
prompt_secret() {
local var_name="$1"
local prompt_text="$2"
local default="${3:-}"
local val=""
while [ -z "$val" ]; do
if [ -n "$default" ]; then
read -s -p "$prompt_text (default: ${default:0:4}...): " val
else
read -s -p "$prompt_text: " val
fi
echo
if [ -z "$val" ] && [ -n "$default" ]; then
val="$default"
break
elif [ -z "$val" ]; then
echo " ⚠ Value cannot be empty. Press Ctrl+C to cancel."
fi
done
echo "$val"
}
```
### Pattern 3: Using `hermes config set` vs `sed` for YAML
**What:** The Hermes CLI provides `hermes config set <path> <value>` for individual key-value pairs in config.yaml. For complex structures (arrays like `docker_volumes`), fall back to `sed` or YAML-aware tools.
**When to use:** Prefer `hermes config set` for simple key paths. Use `sed` for multi-line YAML sections (e.g., the entire `terminal:` block with docker_volumes list).
**Example:**
```bash
# Simple key-value — use hermes config set
hermes config set memory.provider hindsight
hermes config set terminal.backend docker
hermes config set terminal.timezone Asia/Singapore
hermes config set approvals.mode manual
# Complex YAML structures — use sed with a heredoc template
# Example: updating terminal.docker_image
hermes config set terminal.docker_image ngn-agent:latest
# docker_volumes is an array — build via sed or yq
# For arrays, hermes config set may not work; use sed or Python
```
### Anti-Patterns to Avoid
- **Installing tools via `apt-get install` without version pinning** — leads to unreproducible builds. Always pin versions with `=version` syntax or download specific releases.
- **Using `snap` in Docker** — snap requires systemd which doesn't run inside Docker containers. Use curl/apt binary installation instead.
- **Hardcoding user paths in config templates** — the setup script must parameterize all paths (SSH keys, repo paths).
- **Overwriting existing config without backup** — setup script should back up existing `~/.hermes/config.yaml` before modifying.
## Don't Hand-Roll
| Problem | Don't Build | Use Instead | Why |
|---------|-------------|-------------|-----|
| YAML manipulation in bash | Custom YAML parser with `sed`/`awk` | `hermes config set` (for simple keys), `yq` or Python `yaml` for complex structures | YAML is whitespace-sensitive; fragile with sed |
| Cron job management | Writing to crontab directly | `hermes cron create` CLI | Hermes cron is managed in its internal DB, not system crontab; crontab entries would bypass Hermes delivery |
| SSH key generation | Automated SSH key creation | Skip — require user to have keys; script just validates they exist | SSH keys with passphrase prompting would break headless operation |
| AWS credential handling | Storing AWS keys in .env | Use existing `./.aws/` SSO config mounted as volume | ngn-agent uses SSO role chaining, not static keys |
| Docker image registry | Pushing to Docker Hub/GHCR | Tag as `ngn-agent:latest` local only | No CI/CD pipeline established; manual `docker build` in CONTEXT.md scope |
**Key insight:** Three things in this phase already have canonical solutions from Hermes itself (`hermes config set`, `hermes cron create`) or from the project's existing architecture (SSO-based AWS auth, SSH key assumption). Building alternatives to these is wasted effort.
## Common Pitfalls
### Pitfall 1: Base image tag `python3.11-nodejs20` is deprecated
**What goes wrong:** Docker build fails with `manifest for nikolaik/python-nodejs:python3.11-nodejs20 not found`.
**Why it happens:** The image maintainer drops tags when Node.js versions reach EOL. Node.js 20 reached EOL in April 2026.
**How to avoid:** Use `python3.11-nodejs22` or `python3.11-nodejs22-bookworm` instead — Node.js 22 is LTS until April 2027.
**Warning signs:** `docker pull nikolaik/python-nodejs:python3.11-nodejs20` fails with manifest error.
### Pitfall 2: `hermes cron create` requires specific CLI syntax
**What goes wrong:** Cron job creation fails with CLI errors.
**Why it happens:** The `hermes cron create` CLI has evolved across Hermes versions. The Phase 8 summaries documented the correct syntax: `hermes cron create --deliver telegram --skill session '0 9 * * *' 'prompt text'`.
**How to avoid:** Use the exact patterns verified in Phase 8:
- Skill-backed: `hermes cron create --deliver telegram --skill <name> 'schedule' 'prompt'`
- No-agent: `hermes cron create --no-agent --script <path> 'schedule'`
**Warning signs:** `Unknown flag` error from hermes CLI.
### Pitfall 3: `apt-get install terraform` version pinning format
**What goes wrong:** Apt fails to find the exact version specified.
**Why it happens:** HashiCorp's apt repo uses specific version formats. The installable version string for terraform is `1.15.6` (just the X.Y.Z).
**How to avoid:** Use `apt-get install -y terraform=1.15.6` — the version string matches the release tag.
**Warning signs:** `E: Version '1.15.6-1' not found` — try without the `-1` suffix.
### Pitfall 4: Embedded script files in setup script become stale
**What goes wrong:** The setup script copies skill files and scripts that are included as base64-encoded content, but these drift from the actual source files in `~/.hermes/`.
**Why it happens:** The setup script is a snapshot at the time of creation; the skill files evolve independently.
**How to avoid:** Source the skill files from the project repo at setup time rather than embedding them. If the skills live in git, copy from the cloned repo. If not, add a `--snapshot-date` comment in the script header noting when the embedded content was frozen.
**Warning signs:** User runs setup in 3 months and gets outdated skills.
### Pitfall 5: Dockerfile RUN layer cache busting
**What goes wrong:** Changing one tool's version rebuilds all subsequent layers because they're in separate RUN commands that invalidate the apt cache.
**Why it happens:** Each RUN command creates a new layer. If a tool's download URL changes, all subsequent layers after it are invalidated.
**How to avoid:** Order installs from most-frequently-changed (pup, kubectl) to least-changed (terraform, aws-cli), or combine all apt installs into one RUN.
## Code Examples
### Dockerfile Complete — Multi-Tool Installation
```dockerfile
# Source: [CITED: Multiple official tool installation docs — see Standard Stack]
ARG PYTHON_NODEJS_TAG=python3.11-nodejs22-bookworm
FROM nikolaik/python-nodejs:${PYTHON_NODEJS_TAG}
LABEL description="ngn-agent: Custom Hermes Docker image with platform engineering tools"
LABEL maintainer="ngn-agent"
# Tool versions — pin for reproducibility
ARG TERRAFORM_VERSION=1.15.6
ARG HELM_VERSION=4.2.1
ARG KUBECTL_VERSION=1.36.1
ARG PUPP_VERSION=1.1.0
# Install system dependencies
RUN apt-get update && apt-get install -y --no-install-recommends \
curl \
ca-certificates \
unzip \
gnupg \
wget \
&& rm -rf /var/lib/apt/lists/*
# Install AWS CLI v2 (official installer — no apt repo for v2)
RUN curl -fsSL "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip" \
&& unzip -q awscliv2.zip \
&& ./aws/install --bin-dir /usr/local/bin --install-dir /usr/local/aws-cli \
&& rm -rf awscliv2.zip aws/
# Install Terraform (HashiCorp apt repo)
RUN wget -O- https://apt.releases.hashicorp.com/gpg | gpg --dearmor -o /usr/share/keyrings/hashicorp-archive-keyring.gpg \
&& echo "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/hashicorp-archive-keyring.gpg] https://apt.releases.hashicorp.com $(lsb_release -cs) main" \
| tee /etc/apt/sources.list.d/hashicorp.list \
&& apt-get update && apt-get install -y terraform=${TERRAFORM_VERSION} \
&& rm -rf /var/lib/apt/lists/*
# Install kubectl (Google Kubernetes apt repo)
RUN curl -fsSL https://pkgs.k8s.io/core:/stable:/v1.36/deb/Release.key | gpg --dearmor -o /usr/share/keyrings/kubernetes-apt-keyring.gpg \
&& echo 'deb [signed-by=/usr/share/keyrings/kubernetes-apt-keyring.gpg] https://pkgs.k8s.io/core:/stable:/v1.36/deb/ /' \
| tee /etc/apt/sources.list.d/kubernetes.list \
&& apt-get update && apt-get install -y kubectl \
&& rm -rf /var/lib/apt/lists/*
# Install Helm (Buildkite apt repo)
RUN curl -fsSL https://packages.buildkite.com/helm-linux/helm-debian/gpgkey | gpg --dearmor -o /usr/share/keyrings/helm.gpg \
&& echo "deb [signed-by=/usr/share/keyrings/helm.gpg] https://packages.buildkite.com/helm-linux/helm-debian/any/ any main" \
| tee /etc/apt/sources.list.d/helm-stable-debian.list \
&& apt-get update && apt-get install -y helm \
&& rm -rf /var/lib/apt/lists/*
# Install Datadog CLI (pup) — Rust binary from GitHub releases
RUN curl -fsSL "https://github.com/DataDog/pup/releases/download/v${PUPP_VERSION}/pup_${PUPP_VERSION}_Linux_x86_64.tar.gz" \
-o /tmp/pup.tar.gz \
&& tar xzf /tmp/pup.tar.gz -C /usr/local/bin/ pup \
&& rm -f /tmp/pup.tar.gz
# Verify all installations
RUN echo "=== Tool versions ===" \
&& aws --version \
&& terraform --version \
&& helm version --short \
&& kubectl version --client --output=yaml 2>/dev/null | grep gitVersion \
&& pup --version
# Default command (matching base image behavior)
CMD ["bash"]
```
### Build Script
```bash
#!/bin/bash
# Source: Project convention
set -euo pipefail
IMAGE_NAME="ngn-agent"
IMAGE_TAG="latest"
DOCKER_DIR="$(cd "$(dirname "$0")" && pwd)"
echo "==> Building ${IMAGE_NAME}:${IMAGE_TAG}..."
docker build \
-t "${IMAGE_NAME}:${IMAGE_TAG}" \
-f "${DOCKER_DIR}/Dockerfile" \
"${DOCKER_DIR}"
echo "==> Build complete: ${IMAGE_NAME}:${IMAGE_TAG}"
docker images "${IMAGE_NAME}:${IMAGE_TAG}"
```
### Setup Script — Heresd Config Template Generation
```bash
# Source: Derived from current config.yaml state [VERIFIED: ~/.hermes/config.yaml]
generate_config_yaml() {
local ssh_key_1="$1"
local ssh_key_2="$2"
local ssh_config="$3"
local ssh_known_hosts="$4"
local repo_ops="$5"
local repo_deploy="$6"
local repo_devtools="$7"
local timezone="$8"
local docker_image="$9"
# Backup existing config
if [ -f ~/.hermes/config.yaml ]; then
cp ~/.hermes/config.yaml ~/.hermes/config.yaml.bak.$(date +%Y%m%d_%H%M%S)
echo " → Backed up existing config.yaml"
fi
# Use hermes config set for simple keys
hermes config set terminal.backend docker
hermes config set terminal.docker_image "${docker_image}"
hermes config set terminal.cwd /workspace
hermes config set terminal.container_memory 5120
hermes config set terminal.container_disk 51200
hermes config set terminal.container_cpu 1
hermes config set terminal.lifetime_seconds 300
hermes config set memory.provider hindsight
hermes config set timezone "${timezone}"
hermes config set telegram.reactions false
hermes config set terminal.docker_env.AWS_REGION us-east-1
# docker_volumes and shell_init_files need sed or Python for array manipulation
# Python is available on macOS — use it for safe YAML modification
python3 -c "
import yaml, sys
path = os.path.expanduser('~/.hermes/config.yaml')
with open(path) as f:
config = yaml.safe_load(f)
config['terminal']['docker_volumes'] = [
'${ssh_key_1}:/root/.ssh/id_ed25519razer:ro',
'${ssh_key_2}:/root/.ssh/id_rsa:ro',
'${ssh_config}:/root/.ssh/config:ro',
'${ssh_known_hosts}:/root/.ssh/known_hosts:ro',
'/Users/bapung/.aws/config:/root/.aws/config:ro',
'/Users/bapung/.aws/sso/cache:/root/.aws/sso/cache:rw',
'${repo_ops}:/workspace/rai-ops:rw',
'${repo_deploy}:/workspace/rai-deployment:rw',
'${repo_devtools}:/workspace/rai-devtools:rw',
os.path.expanduser('~/.hermes/scripts') + ':/usr/local/bin:ro',
]
config['terminal']['docker_forward_env'] = ['JIRA_EMAIL', 'JIRA_API_TOKEN', 'DEFAULT_REPOS']
config['terminal']['shell_init_files'] = ['/usr/local/bin/session-init.sh']
with open(path, 'w') as f:
yaml.dump(config, f, default_flow_style=False)
"
}
```
## State of the Art
| Old Approach | Current Approach | When Changed | Impact |
|--------------|------------------|--------------|--------|
| TFA (Terraform) pre-1.0 CLI syntax | Terraform 1.x stable CLI | 2020 | Current terraform 1.15.6 uses stable HCL syntax — no migration issues |
| AWS CLI v1 (Python pip package) | AWS CLI v2 (self-contained installer) | 2020 | v1 was `pip install awscli`; v2 uses curl→zip→install — Dockerfile must use official installer |
| Helm 2 (Tiller-based) | Helm 3/4 (client-only) | 2019/2025 | Helm 4.2.1 has no Tiller — simpler security model. APT repo install is same as Helm 3 |
| Pup CLI (pre-v1.0) | Pup 1.1.0 stable | 2026-06 | Pup now has stable release with OAuth2 auth; prebuilt binaries available |
**Deprecated/outdated:**
- `python3.11-nodejs20` base image tag: No longer published by maintainer. Use `python3.11-nodejs22-bookworm` instead.
- Snap packages in Docker: Snap requires systemd. Don't use `snap install aws-cli --classic` in Dockerfile.
## Assumptions Log
| # | Claim | Section | Risk if Wrong |
|---|-------|---------|---------------|
| A1 | The base image tag `python3.11-nodejs20` is no longer available on Docker Hub | Standard Stack | Build fails with manifest not found — must verify and update tag |
| A2 | Hermes v0.16+ `hermes config set` supports setting nested paths like `terminal.docker_image` | Don't Hand-Roll | May need to use Python/sed instead — minimal impact, fallback exists |
| A3 | Hermes v0.16+ `hermes cron create` matches Phase 8 documented syntax | Common Pitfalls | Cron job registration may fail with different syntax |
| A4 | All five skills can be embedded in the setup script | Architecture Patterns | Skills directory structure may have hidden files (metadata files) that aren't SKILL.md |
| A5 | `nikolaik/python-nodejs` base image has `python3` available for YAML manipulation | Code Examples | Would need to use `pip install pyyaml` in setup script or use `yq` instead |
## Open Questions
1. **Is `hermes config set` capable of setting YAML array values (like `docker_volumes`)?**
- What we know: `hermes config set terminal.docker_image ngn-agent:latest` works for simple key-value.
- What's unclear: Whether it can set array elements or only scalar values.
- Recommendation: Plan uses `hermes config set` for scalars, Python/sed for arrays. Test `hermes config set terminal.docker_volumes` — if it fails (likely), fall back to Python YAML manipulation.
2. **What is the correct kubectl apt package version string for version pinning?**
- What we know: The Kubernetes apt repo at `pkgs.k8s.io` provides kubectl for v1.36.
- What's unclear: The exact `apt-get install kubectl=1.36.1-*` format vs just `kubectl=1.36.1`.
- Recommendation: Use `apt-get install -y kubectl` without version pinning for kubectl, since it's meant to match the cluster version which may not be the absolute latest.
3. **Should the setup script embed skill/script content as base64 or reference files from the git repo?**
- What we know: The setup script needs to create 5 skills + 2 scripts on a fresh machine.
- What's unclear: Whether these files exist in the ngn-agent git repo or only in `~/.hermes/`.
- Recommendation: If skills are in `.planning/phases/` commit history, extract them at setup time from the git repo. If not, embed as base64. The Hermes skills live at `~/.hermes/skills/ngn-agent/` — check if they're tracked in git.
## Environment Availability
> Skip this section — the phase has no external dependencies that need runtime probing. Docker image build requires Docker (verified: `Docker version 29.4.0, build 9d7ad9f` — available). Setup script runs on macOS target with bash built-in, `hermes` CLI assumed present (v0.16+).
## Validation Architecture
> Skipped — `workflow.nyquist_validation` is explicitly `false` in `.planning/config.json`.
## Security Domain
> The `security_enforcement` key is absent from `.planning/config.json` (default: enabled).
### Applicable ASVS Categories
| ASVS Category | Applies | Standard Control |
|---------------|---------|-----------------|
| V5 Input Validation | yes | Setup script validates secret non-empty before accepting; validates SSH key paths exist |
| V7 Cryptography at Rest | partial | Secrets stored in `~/.hermes/.env` (plaintext file at rest). Acceptable for local machine — Hermes itself manages file permissions. |
| V9 Cryptographic Architecture | no | No custom crypto — tools use their own auth mechanisms |
### Known Threat Patterns for {Dockerfile + setup script}
| Pattern | STRIDE | Standard Mitigation |
|---------|--------|---------------------|
| Secret exposure in terminal history | Information Disclosure | Setup script uses `read -s` (masked input, no echo). Secrets are never echoed to terminal. |
| Config file world-readable permissions | Tampering | Script sets `chmod 600` on `~/.hermes/.env` after writing |
| Man-in-the-middle on tool download | Tampering | Dockerfile uses HTTPS for all downloads (AWS S3, GitHub, HashiCorp, pkgs.k8s.io, Buildkite). GPG signature verification in apt repos. |
| Accidental build context leak | Information Disclosure | Dockerfile should not `COPY .` — use `COPY docker/` only to avoid leaking `.env` or other secrets into image layers |
## Sources
### Primary (HIGH confidence)
- [AWS CLI v2 Linux install](https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html) — verified official curl→unzip→install procedure
- [Terraform Linux install](https://developer.hashicorp.com/terraform/install#linux) — verified HashiCorp apt repo method + version 1.15.6
- [Helm install via apt](https://helm.sh/docs/intro/install/#from-apt-debianubuntu) — verified Buildkite apt repo method + version 4.2.1
- [kubectl Linux install](https://kubernetes.io/docs/tasks/tools/install-kubectl-linux/) — verified Google Kubernetes apt repo method
- [Kubernetes releases](https://kubernetes.io/releases/) — verified latest stable v1.36.1
- [Pup CLI releases](https://github.com/DataDog/pup/releases/tag/v1.1.0) — verified latest version 1.1.0
- [nikolaik/python-nodejs Docker Hub](https://hub.docker.com/r/nikolaik/python-nodejs) — verified available tags and versions
- [Current ~/.hermes/config.yaml](file://~/.hermes/config.yaml) — verified 565-line source of truth for setup script
- [Current ~/.hermes/.env](file://~/.hermes/.env) — verified 484-line env template
- [Current ~/.hermes/hindsight/config.json](file://~/.hermes/hindsight/config.json) — verified JSON file content
- [Current session-init.sh](file://~/.hermes/scripts/session-init.sh) — verified 37-line script
- [Current archive-stale-sessions.sh](file://~/.hermes/scripts/archive-stale-sessions.sh) — verified 41-line script
- [Phase 8 cron registration patterns](file:///Users/bapung/Razer/ngn-agent/.planning/phases/08-cron-reporting/08-01-SUMMARY.md) — verified `hermes cron create` CLI syntax
### Secondary (MEDIUM confidence)
- [Hermes research: Extensibility](file:///Users/bapung/Razer/ngn-agent/.planning/research/hermes/EXTENSIBILITY.md) — verified hermes config set capability
- [Base image GitHub repo](https://github.com/nikolaik/docker-python-nodejs) — confirmed tag generation pattern
### Tertiary (LOW confidence)
- (none — all claims verified via official sources or current config files)
## Metadata
**Confidence breakdown:**
- Standard stack: HIGH — all tool versions verified from official sources
- Architecture: HIGH — patterns derived from current working configuration
- Pitfalls: HIGH — based on documented deprecations and Phase 8 execution knowledge
- Base image tag availability: MEDIUM — need to confirm `python3.11-nodejs20` tag status at build time
**Research date:** 2026-06-15
**Valid until:** 2026-07-15 (tool versions may receive patch updates within 30 days; base image tags may change if maintainer drops more versions)