Files
ngn-agent/.planning/phases/09-tooling-portable-setup/09-01-SUMMARY.md

7.7 KiB

phase, plan, subsystem, tags, requires, provides, affects, tech-stack, key-files, key-decisions, patterns-established, requirements-completed, duration, completed
phase plan subsystem tags requires provides affects tech-stack key-files key-decisions patterns-established requirements-completed duration completed
09-tooling-portable-setup 01 infra docker, aws-cli, terraform, helm, kubectl, datadog-cli, arm64
phase provides
09-tooling-portable-setup Research on tool versions, architecture decisions, installation methods
Custom Hermes Docker image (ngn-agent:latest) with 5 platform engineering CLI tools
Version-pinned, reproducible Dockerfile with architecture detection (x86_64 + arm64)
Single-command build entry point (docker/build.sh)
09-tooling-portable-setup (plan 02 - setup script)
added patterns
Dockerfile multi-tool build pattern
Architecture-detection case/esac for binary downloads
Version pinning via ARGs for reproducibility
Multi-architecture support via uname -m detection
GPG-verified apt repos for tool installation
created modified
ngn-agent/docker/Dockerfile
ngn-agent/docker/build.sh
Helm version 4.2.1 not in Buildkite apt repo; pinned to 4.2.0 instead
Terraform apt version format requires -1 suffix (terraform=1.15.6-1)
Added architecture detection for AWS CLI and pup (x86_64 + aarch64) for native ARM64 support
Used /etc/os-release instead of lsb_release (not available in base image)
Multi-tool Dockerfile: version-pinned ARGs, GPG-verified apt repos, architecture detection for binary downloads
TOOL-01
6 min 2026-06-15

Phase 9 Plan 1: Custom Hermes Docker Image with Platform Engineering Tools

Version-pinned Docker image (ngn-agent:latest) with aws-cli, terraform, helm, kubectl, and datadog CLI (pup), buildable via a single docker/build.sh command — with native ARM64 support for Apple Silicon.

Performance

  • Duration: 6 min
  • Started: 2026-06-15T15:18:47Z
  • Completed: 2026-06-15T15:24:38Z
  • Tasks: 2
  • Files modified: 2

Accomplishments

  • Created ngn-agent/docker/Dockerfile — version-pinned installations of 5 platform engineering tools on top of nikolaik/python-nodejs:python3.11-nodejs20
  • Created ngn-agent/docker/build.sh — single-command build entry point with set -euo pipefail
  • Built and verified ngn-agent:latest image with all 5 tools working natively on ARM64 (Apple Silicon)
  • Added architecture detection (uname -m) for AWS CLI and pup binary downloads supporting both x86_64 and aarch64

Task Commits

Each task was committed atomically:

  1. Task 1: Create Dockerfile with version-pinned tool installations - 78fd400 (feat)
  2. Task 2: Create build.sh and verify image builds successfully - 2797a64 (feat, includes deviation fixes)
  3. Task 2 follow-up: Add D-04/D-05 references to build.sh - cc1da75 (docs)

Plan metadata: (committed as part of Task 2 commits)

Files Created/Modified

  • ngn-agent/docker/Dockerfile (112 lines) — Multi-architecture Dockerfile with 5 platform engineering tools, version-pinned via ARGs, GPG-verified apt repos, architecture detection for binary downloads
  • ngn-agent/docker/build.sh (26 lines) — Single-command build entry point, resolves script location for correct build context (T-09-02 mitigation)

Decisions Made

  • Helm version 4.2.0 used instead of planned 4.2.1 — 4.2.1 doesn't exist in Buildkite apt repo
  • Architecture detection added for AWS CLI and pup — base image runs on ARM64 natively (Apple Silicon), x86_64 binaries would need QEMU emulation
  • Terraform version string uses -1 suffix for apt compatibility (terraform=1.15.6-1)
  • /etc/os-release used for codename detection instead of lsb_release (not shipped in base image)

Deviations from Plan

Auto-fixed Issues

1. [Rule 3 - Blocking] lsb_release not found in base image

  • Found during: Task 1 (Dockerfile creation, first build attempt)
  • Issue: lsb_release is not installed in the base image, causing the terraform apt repo codename resolution to fail with "Malformed entry"
  • Fix: Replaced $(lsb_release -cs) with . /etc/os-release && echo ${VERSION_CODENAME} for Debian codename detection
  • Files modified: ngn-agent/docker/Dockerfile
  • Verification: Build succeeded, codename resolved to trixie, terraform installed correctly
  • Committed in: 2797a64 (Task 2 commit)

2. [Rule 1 - Bug] Hardcoded x86_64 binary downloads fail on ARM64 (Apple Silicon)

  • Found during: Task 2 (build verification — tools hung via QEMU)
  • Issue: AWS CLI and pup binaries were hardcoded to x86_64 URLs. On Apple Silicon, the base image runs natively on ARM64, and x86_64 binaries triggered QEMU emulation that hung without proper /lib64/ld-linux-x86-64.so.2
  • Fix: Added architecture detection via uname -m with case/esac for both AWS CLI (awscli-exe-linux-{arch}.zip) and pup (pup_${version}_Linux_{arch}.tar.gz) downloads
  • Files modified: ngn-agent/docker/Dockerfile
  • Verification: All 5 tools now run natively on ARM64 (aarch64) without QEMU warnings. aws --version reports exe/aarch64.debian.13, terraform reports linux_arm64
  • Committed in: 2797a64 (Task 2 commit)

3. [Rule 1 - Bug] Terraform version string missing -1 suffix

  • Found during: Task 2 (build attempt — apt version not found)
  • Issue: apt-get install terraform=1.15.6 failed with "Version '1.15.6' not found" because HashiCorp apt repo uses version format 1.15.6-1
  • Fix: Changed install line to terraform=${TERRAFORM_VERSION}-1
  • Files modified: ngn-agent/docker/Dockerfile
  • Verification: Terraform 1.15.6-1 installed and runs successfully
  • Committed in: 2797a64 (Task 2 commit)

4. [Rule 1 - Bug] Helm version 4.2.1 not found in Buildkite apt repo

  • Found during: Task 2 (build attempt — apt version not found)
  • Issue: Helm 4.2.1 doesn't exist in the Buildkite apt repo; latest available is 4.2.0-1
  • Fix: Changed HELM_VERSION ARG from 4.2.1 to 4.2.0
  • Files modified: ngn-agent/docker/Dockerfile
  • Verification: Helm 4.2.0-1 installed successfully, helm version --short reports v4.2.0+g0646808
  • Committed in: 2797a64 (Task 2 commit)

Total deviations: 4 auto-fixed (2 bugs, 1 blocking, 1 version correction) Impact on plan: All fixes essential for build to succeed and tools to work correctly on the target architecture. No scope creep.

Issues Encountered

  • Base image nikolaik/python-nodejs:python3.11-nodejs20 is still available on Docker Hub (as of 2026-06-15) — the planned deprecation did not occur, so the original tag was used
  • The base image is multi-architecture (arm64 + amd64); on Apple Silicon, Docker selects the arm64 variant automatically. Binary downloads for tools without native ARM64 builds were fixed with architecture detection

User Setup Required

None - no external service configuration required. Run docker/build.sh to rebuild the image.

Next Phase Readiness

  • Docker image ngn-agent:latest is built and verified with all 5 tools
  • Ready for Plan 2 (portable setup script) which will reference this image in ~/.hermes/config.yaml
  • The base image tag nikolaik/python-nodejs:python3.11-nodejs20 should be monitored — if it gets deprecated, update to python3.11-nodejs22-bookworm as documented in Dockerfile comment

Self-Check: PASSED

All files and commits verified:

  • docker/Dockerfile exists (112 lines)
  • docker/build.sh exists (26 lines, executable)
  • 09-01-SUMMARY.md exists
  • Commit 78fd400 — feat: Dockerfile creation
  • Commit 2797a64 — feat: build.sh + verification + fixes
  • Commit cc1da75 — docs: D-04/D-05 references
  • Commit 717bb6f — docs: plan SUMMARY

Phase: 09-tooling-portable-setup Completed: 2026-06-15