docs(09-tooling-portable-setup-01): complete custom Docker image plan

- SUMMARY.md with deviations, decisions, and verified tool versions
This commit is contained in:
2026-06-15 23:25:11 +08:00
parent cc1da75700
commit 717bb6f35b

View File

@@ -0,0 +1,149 @@
---
phase: 09-tooling-portable-setup
plan: 01
subsystem: infra
tags: docker, aws-cli, terraform, helm, kubectl, datadog-cli, arm64
# Dependency graph
requires:
- phase: 09-tooling-portable-setup
provides: Research on tool versions, architecture decisions, installation methods
provides:
- Custom Hermes Docker image (ngn-agent:latest) with 5 platform engineering CLI tools
- Version-pinned, reproducible Dockerfile with architecture detection (x86_64 + arm64)
- Single-command build entry point (docker/build.sh)
affects: [09-tooling-portable-setup (plan 02 - setup script)]
# Tech tracking
tech-stack:
added:
- Dockerfile multi-tool build pattern
- Architecture-detection case/esac for binary downloads
patterns:
- Version pinning via ARGs for reproducibility
- Multi-architecture support via uname -m detection
- GPG-verified apt repos for tool installation
key-files:
created:
- ngn-agent/docker/Dockerfile
- ngn-agent/docker/build.sh
modified: []
key-decisions:
- "Helm version 4.2.1 not in Buildkite apt repo; pinned to 4.2.0 instead"
- "Terraform apt version format requires -1 suffix (terraform=1.15.6-1)"
- "Added architecture detection for AWS CLI and pup (x86_64 + aarch64) for native ARM64 support"
- "Used /etc/os-release instead of lsb_release (not available in base image)"
patterns-established:
- "Multi-tool Dockerfile: version-pinned ARGs, GPG-verified apt repos, architecture detection for binary downloads"
requirements-completed: [TOOL-01]
# Metrics
duration: 6 min
completed: 2026-06-15
---
# Phase 9 Plan 1: Custom Hermes Docker Image with Platform Engineering Tools
**Version-pinned Docker image (ngn-agent:latest) with aws-cli, terraform, helm, kubectl, and datadog CLI (pup), buildable via a single docker/build.sh command — with native ARM64 support for Apple Silicon.**
## Performance
- **Duration:** 6 min
- **Started:** 2026-06-15T15:18:47Z
- **Completed:** 2026-06-15T15:24:38Z
- **Tasks:** 2
- **Files modified:** 2
## Accomplishments
- Created `ngn-agent/docker/Dockerfile` — version-pinned installations of 5 platform engineering tools on top of `nikolaik/python-nodejs:python3.11-nodejs20`
- Created `ngn-agent/docker/build.sh` — single-command build entry point with `set -euo pipefail`
- Built and verified `ngn-agent:latest` image with all 5 tools working natively on ARM64 (Apple Silicon)
- Added architecture detection (`uname -m`) for AWS CLI and pup binary downloads supporting both x86_64 and aarch64
## Task Commits
Each task was committed atomically:
1. **Task 1: Create Dockerfile with version-pinned tool installations** - `78fd400` (feat)
2. **Task 2: Create build.sh and verify image builds successfully** - `2797a64` (feat, includes deviation fixes)
3. **Task 2 follow-up: Add D-04/D-05 references to build.sh** - `cc1da75` (docs)
**Plan metadata:** `(committed as part of Task 2 commits)`
## Files Created/Modified
- `ngn-agent/docker/Dockerfile` (112 lines) — Multi-architecture Dockerfile with 5 platform engineering tools, version-pinned via ARGs, GPG-verified apt repos, architecture detection for binary downloads
- `ngn-agent/docker/build.sh` (26 lines) — Single-command build entry point, resolves script location for correct build context (T-09-02 mitigation)
## Decisions Made
- **Helm version 4.2.0** used instead of planned 4.2.1 — 4.2.1 doesn't exist in Buildkite apt repo
- **Architecture detection** added for AWS CLI and pup — base image runs on ARM64 natively (Apple Silicon), x86_64 binaries would need QEMU emulation
- **Terraform version string** uses `-1` suffix for apt compatibility (`terraform=1.15.6-1`)
- **`/etc/os-release`** used for codename detection instead of `lsb_release` (not shipped in base image)
## Deviations from Plan
### Auto-fixed Issues
**1. [Rule 3 - Blocking] `lsb_release` not found in base image**
- **Found during:** Task 1 (Dockerfile creation, first build attempt)
- **Issue:** `lsb_release` is not installed in the base image, causing the terraform apt repo codename resolution to fail with "Malformed entry"
- **Fix:** Replaced `$(lsb_release -cs)` with `. /etc/os-release && echo ${VERSION_CODENAME}` for Debian codename detection
- **Files modified:** `ngn-agent/docker/Dockerfile`
- **Verification:** Build succeeded, codename resolved to `trixie`, terraform installed correctly
- **Committed in:** `2797a64` (Task 2 commit)
**2. [Rule 1 - Bug] Hardcoded x86_64 binary downloads fail on ARM64 (Apple Silicon)**
- **Found during:** Task 2 (build verification — tools hung via QEMU)
- **Issue:** AWS CLI and pup binaries were hardcoded to x86_64 URLs. On Apple Silicon, the base image runs natively on ARM64, and x86_64 binaries triggered QEMU emulation that hung without proper `/lib64/ld-linux-x86-64.so.2`
- **Fix:** Added architecture detection via `uname -m` with case/esac for both AWS CLI (`awscli-exe-linux-{arch}.zip`) and pup (`pup_${version}_Linux_{arch}.tar.gz`) downloads
- **Files modified:** `ngn-agent/docker/Dockerfile`
- **Verification:** All 5 tools now run natively on ARM64 (aarch64) without QEMU warnings. `aws --version` reports `exe/aarch64.debian.13`, terraform reports `linux_arm64`
- **Committed in:** `2797a64` (Task 2 commit)
**3. [Rule 1 - Bug] Terraform version string missing -1 suffix**
- **Found during:** Task 2 (build attempt — apt version not found)
- **Issue:** `apt-get install terraform=1.15.6` failed with "Version '1.15.6' not found" because HashiCorp apt repo uses version format `1.15.6-1`
- **Fix:** Changed install line to `terraform=${TERRAFORM_VERSION}-1`
- **Files modified:** `ngn-agent/docker/Dockerfile`
- **Verification:** Terraform 1.15.6-1 installed and runs successfully
- **Committed in:** `2797a64` (Task 2 commit)
**4. [Rule 1 - Bug] Helm version 4.2.1 not found in Buildkite apt repo**
- **Found during:** Task 2 (build attempt — apt version not found)
- **Issue:** Helm 4.2.1 doesn't exist in the Buildkite apt repo; latest available is 4.2.0-1
- **Fix:** Changed `HELM_VERSION` ARG from `4.2.1` to `4.2.0`
- **Files modified:** `ngn-agent/docker/Dockerfile`
- **Verification:** Helm 4.2.0-1 installed successfully, `helm version --short` reports `v4.2.0+g0646808`
- **Committed in:** `2797a64` (Task 2 commit)
---
**Total deviations:** 4 auto-fixed (2 bugs, 1 blocking, 1 version correction)
**Impact on plan:** All fixes essential for build to succeed and tools to work correctly on the target architecture. No scope creep.
## Issues Encountered
- Base image `nikolaik/python-nodejs:python3.11-nodejs20` is still available on Docker Hub (as of 2026-06-15) — the planned deprecation did not occur, so the original tag was used
- The base image is multi-architecture (arm64 + amd64); on Apple Silicon, Docker selects the arm64 variant automatically. Binary downloads for tools without native ARM64 builds were fixed with architecture detection
## User Setup Required
None - no external service configuration required. Run `docker/build.sh` to rebuild the image.
## Next Phase Readiness
- Docker image `ngn-agent:latest` is built and verified with all 5 tools
- Ready for Plan 2 (portable setup script) which will reference this image in `~/.hermes/config.yaml`
- The base image tag `nikolaik/python-nodejs:python3.11-nodejs20` should be monitored — if it gets deprecated, update to `python3.11-nodejs22-bookworm` as documented in Dockerfile comment
---
*Phase: 09-tooling-portable-setup*
*Completed: 2026-06-15*