--- phase: 09-tooling-portable-setup plan: 01 subsystem: infra tags: docker, aws-cli, terraform, helm, kubectl, datadog-cli, arm64 # Dependency graph requires: - phase: 09-tooling-portable-setup provides: Research on tool versions, architecture decisions, installation methods provides: - Custom Hermes Docker image (ngn-agent:latest) with 5 platform engineering CLI tools - Version-pinned, reproducible Dockerfile with architecture detection (x86_64 + arm64) - Single-command build entry point (docker/build.sh) affects: [09-tooling-portable-setup (plan 02 - setup script)] # Tech tracking tech-stack: added: - Dockerfile multi-tool build pattern - Architecture-detection case/esac for binary downloads patterns: - Version pinning via ARGs for reproducibility - Multi-architecture support via uname -m detection - GPG-verified apt repos for tool installation key-files: created: - ngn-agent/docker/Dockerfile - ngn-agent/docker/build.sh modified: [] key-decisions: - "Helm version 4.2.1 not in Buildkite apt repo; pinned to 4.2.0 instead" - "Terraform apt version format requires -1 suffix (terraform=1.15.6-1)" - "Added architecture detection for AWS CLI and pup (x86_64 + aarch64) for native ARM64 support" - "Used /etc/os-release instead of lsb_release (not available in base image)" patterns-established: - "Multi-tool Dockerfile: version-pinned ARGs, GPG-verified apt repos, architecture detection for binary downloads" requirements-completed: [TOOL-01] # Metrics duration: 6 min completed: 2026-06-15 --- # Phase 9 Plan 1: Custom Hermes Docker Image with Platform Engineering Tools **Version-pinned Docker image (ngn-agent:latest) with aws-cli, terraform, helm, kubectl, and datadog CLI (pup), buildable via a single docker/build.sh command — with native ARM64 support for Apple Silicon.** ## Performance - **Duration:** 6 min - **Started:** 2026-06-15T15:18:47Z - **Completed:** 2026-06-15T15:24:38Z - **Tasks:** 2 - **Files modified:** 2 ## Accomplishments - Created `ngn-agent/docker/Dockerfile` — version-pinned installations of 5 platform engineering tools on top of `nikolaik/python-nodejs:python3.11-nodejs20` - Created `ngn-agent/docker/build.sh` — single-command build entry point with `set -euo pipefail` - Built and verified `ngn-agent:latest` image with all 5 tools working natively on ARM64 (Apple Silicon) - Added architecture detection (`uname -m`) for AWS CLI and pup binary downloads supporting both x86_64 and aarch64 ## Task Commits Each task was committed atomically: 1. **Task 1: Create Dockerfile with version-pinned tool installations** - `78fd400` (feat) 2. **Task 2: Create build.sh and verify image builds successfully** - `2797a64` (feat, includes deviation fixes) 3. **Task 2 follow-up: Add D-04/D-05 references to build.sh** - `cc1da75` (docs) **Plan metadata:** `(committed as part of Task 2 commits)` ## Files Created/Modified - `ngn-agent/docker/Dockerfile` (112 lines) — Multi-architecture Dockerfile with 5 platform engineering tools, version-pinned via ARGs, GPG-verified apt repos, architecture detection for binary downloads - `ngn-agent/docker/build.sh` (26 lines) — Single-command build entry point, resolves script location for correct build context (T-09-02 mitigation) ## Decisions Made - **Helm version 4.2.0** used instead of planned 4.2.1 — 4.2.1 doesn't exist in Buildkite apt repo - **Architecture detection** added for AWS CLI and pup — base image runs on ARM64 natively (Apple Silicon), x86_64 binaries would need QEMU emulation - **Terraform version string** uses `-1` suffix for apt compatibility (`terraform=1.15.6-1`) - **`/etc/os-release`** used for codename detection instead of `lsb_release` (not shipped in base image) ## Deviations from Plan ### Auto-fixed Issues **1. [Rule 3 - Blocking] `lsb_release` not found in base image** - **Found during:** Task 1 (Dockerfile creation, first build attempt) - **Issue:** `lsb_release` is not installed in the base image, causing the terraform apt repo codename resolution to fail with "Malformed entry" - **Fix:** Replaced `$(lsb_release -cs)` with `. /etc/os-release && echo ${VERSION_CODENAME}` for Debian codename detection - **Files modified:** `ngn-agent/docker/Dockerfile` - **Verification:** Build succeeded, codename resolved to `trixie`, terraform installed correctly - **Committed in:** `2797a64` (Task 2 commit) **2. [Rule 1 - Bug] Hardcoded x86_64 binary downloads fail on ARM64 (Apple Silicon)** - **Found during:** Task 2 (build verification — tools hung via QEMU) - **Issue:** AWS CLI and pup binaries were hardcoded to x86_64 URLs. On Apple Silicon, the base image runs natively on ARM64, and x86_64 binaries triggered QEMU emulation that hung without proper `/lib64/ld-linux-x86-64.so.2` - **Fix:** Added architecture detection via `uname -m` with case/esac for both AWS CLI (`awscli-exe-linux-{arch}.zip`) and pup (`pup_${version}_Linux_{arch}.tar.gz`) downloads - **Files modified:** `ngn-agent/docker/Dockerfile` - **Verification:** All 5 tools now run natively on ARM64 (aarch64) without QEMU warnings. `aws --version` reports `exe/aarch64.debian.13`, terraform reports `linux_arm64` - **Committed in:** `2797a64` (Task 2 commit) **3. [Rule 1 - Bug] Terraform version string missing -1 suffix** - **Found during:** Task 2 (build attempt — apt version not found) - **Issue:** `apt-get install terraform=1.15.6` failed with "Version '1.15.6' not found" because HashiCorp apt repo uses version format `1.15.6-1` - **Fix:** Changed install line to `terraform=${TERRAFORM_VERSION}-1` - **Files modified:** `ngn-agent/docker/Dockerfile` - **Verification:** Terraform 1.15.6-1 installed and runs successfully - **Committed in:** `2797a64` (Task 2 commit) **4. [Rule 1 - Bug] Helm version 4.2.1 not found in Buildkite apt repo** - **Found during:** Task 2 (build attempt — apt version not found) - **Issue:** Helm 4.2.1 doesn't exist in the Buildkite apt repo; latest available is 4.2.0-1 - **Fix:** Changed `HELM_VERSION` ARG from `4.2.1` to `4.2.0` - **Files modified:** `ngn-agent/docker/Dockerfile` - **Verification:** Helm 4.2.0-1 installed successfully, `helm version --short` reports `v4.2.0+g0646808` - **Committed in:** `2797a64` (Task 2 commit) --- **Total deviations:** 4 auto-fixed (2 bugs, 1 blocking, 1 version correction) **Impact on plan:** All fixes essential for build to succeed and tools to work correctly on the target architecture. No scope creep. ## Issues Encountered - Base image `nikolaik/python-nodejs:python3.11-nodejs20` is still available on Docker Hub (as of 2026-06-15) — the planned deprecation did not occur, so the original tag was used - The base image is multi-architecture (arm64 + amd64); on Apple Silicon, Docker selects the arm64 variant automatically. Binary downloads for tools without native ARM64 builds were fixed with architecture detection ## User Setup Required None - no external service configuration required. Run `docker/build.sh` to rebuild the image. ## Next Phase Readiness - Docker image `ngn-agent:latest` is built and verified with all 5 tools - Ready for Plan 2 (portable setup script) which will reference this image in `~/.hermes/config.yaml` - The base image tag `nikolaik/python-nodejs:python3.11-nodejs20` should be monitored — if it gets deprecated, update to `python3.11-nodejs22-bookworm` as documented in Dockerfile comment --- *Phase: 09-tooling-portable-setup* *Completed: 2026-06-15*