diff --git a/.planning/phases/09-tooling-portable-setup/09-01-SUMMARY.md b/.planning/phases/09-tooling-portable-setup/09-01-SUMMARY.md new file mode 100644 index 0000000..0075e77 --- /dev/null +++ b/.planning/phases/09-tooling-portable-setup/09-01-SUMMARY.md @@ -0,0 +1,149 @@ +--- +phase: 09-tooling-portable-setup +plan: 01 +subsystem: infra +tags: docker, aws-cli, terraform, helm, kubectl, datadog-cli, arm64 + +# Dependency graph +requires: + - phase: 09-tooling-portable-setup + provides: Research on tool versions, architecture decisions, installation methods +provides: + - Custom Hermes Docker image (ngn-agent:latest) with 5 platform engineering CLI tools + - Version-pinned, reproducible Dockerfile with architecture detection (x86_64 + arm64) + - Single-command build entry point (docker/build.sh) +affects: [09-tooling-portable-setup (plan 02 - setup script)] + +# Tech tracking +tech-stack: + added: + - Dockerfile multi-tool build pattern + - Architecture-detection case/esac for binary downloads + patterns: + - Version pinning via ARGs for reproducibility + - Multi-architecture support via uname -m detection + - GPG-verified apt repos for tool installation + +key-files: + created: + - ngn-agent/docker/Dockerfile + - ngn-agent/docker/build.sh + modified: [] + +key-decisions: + - "Helm version 4.2.1 not in Buildkite apt repo; pinned to 4.2.0 instead" + - "Terraform apt version format requires -1 suffix (terraform=1.15.6-1)" + - "Added architecture detection for AWS CLI and pup (x86_64 + aarch64) for native ARM64 support" + - "Used /etc/os-release instead of lsb_release (not available in base image)" + +patterns-established: + - "Multi-tool Dockerfile: version-pinned ARGs, GPG-verified apt repos, architecture detection for binary downloads" + +requirements-completed: [TOOL-01] + +# Metrics +duration: 6 min +completed: 2026-06-15 +--- + +# Phase 9 Plan 1: Custom Hermes Docker Image with Platform Engineering Tools + +**Version-pinned Docker image (ngn-agent:latest) with aws-cli, terraform, helm, kubectl, and datadog CLI (pup), buildable via a single docker/build.sh command — with native ARM64 support for Apple Silicon.** + +## Performance + +- **Duration:** 6 min +- **Started:** 2026-06-15T15:18:47Z +- **Completed:** 2026-06-15T15:24:38Z +- **Tasks:** 2 +- **Files modified:** 2 + +## Accomplishments + +- Created `ngn-agent/docker/Dockerfile` — version-pinned installations of 5 platform engineering tools on top of `nikolaik/python-nodejs:python3.11-nodejs20` +- Created `ngn-agent/docker/build.sh` — single-command build entry point with `set -euo pipefail` +- Built and verified `ngn-agent:latest` image with all 5 tools working natively on ARM64 (Apple Silicon) +- Added architecture detection (`uname -m`) for AWS CLI and pup binary downloads supporting both x86_64 and aarch64 + +## Task Commits + +Each task was committed atomically: + +1. **Task 1: Create Dockerfile with version-pinned tool installations** - `78fd400` (feat) +2. **Task 2: Create build.sh and verify image builds successfully** - `2797a64` (feat, includes deviation fixes) +3. **Task 2 follow-up: Add D-04/D-05 references to build.sh** - `cc1da75` (docs) + +**Plan metadata:** `(committed as part of Task 2 commits)` + +## Files Created/Modified + +- `ngn-agent/docker/Dockerfile` (112 lines) — Multi-architecture Dockerfile with 5 platform engineering tools, version-pinned via ARGs, GPG-verified apt repos, architecture detection for binary downloads +- `ngn-agent/docker/build.sh` (26 lines) — Single-command build entry point, resolves script location for correct build context (T-09-02 mitigation) + +## Decisions Made + +- **Helm version 4.2.0** used instead of planned 4.2.1 — 4.2.1 doesn't exist in Buildkite apt repo +- **Architecture detection** added for AWS CLI and pup — base image runs on ARM64 natively (Apple Silicon), x86_64 binaries would need QEMU emulation +- **Terraform version string** uses `-1` suffix for apt compatibility (`terraform=1.15.6-1`) +- **`/etc/os-release`** used for codename detection instead of `lsb_release` (not shipped in base image) + +## Deviations from Plan + +### Auto-fixed Issues + +**1. [Rule 3 - Blocking] `lsb_release` not found in base image** +- **Found during:** Task 1 (Dockerfile creation, first build attempt) +- **Issue:** `lsb_release` is not installed in the base image, causing the terraform apt repo codename resolution to fail with "Malformed entry" +- **Fix:** Replaced `$(lsb_release -cs)` with `. /etc/os-release && echo ${VERSION_CODENAME}` for Debian codename detection +- **Files modified:** `ngn-agent/docker/Dockerfile` +- **Verification:** Build succeeded, codename resolved to `trixie`, terraform installed correctly +- **Committed in:** `2797a64` (Task 2 commit) + +**2. [Rule 1 - Bug] Hardcoded x86_64 binary downloads fail on ARM64 (Apple Silicon)** +- **Found during:** Task 2 (build verification — tools hung via QEMU) +- **Issue:** AWS CLI and pup binaries were hardcoded to x86_64 URLs. On Apple Silicon, the base image runs natively on ARM64, and x86_64 binaries triggered QEMU emulation that hung without proper `/lib64/ld-linux-x86-64.so.2` +- **Fix:** Added architecture detection via `uname -m` with case/esac for both AWS CLI (`awscli-exe-linux-{arch}.zip`) and pup (`pup_${version}_Linux_{arch}.tar.gz`) downloads +- **Files modified:** `ngn-agent/docker/Dockerfile` +- **Verification:** All 5 tools now run natively on ARM64 (aarch64) without QEMU warnings. `aws --version` reports `exe/aarch64.debian.13`, terraform reports `linux_arm64` +- **Committed in:** `2797a64` (Task 2 commit) + +**3. [Rule 1 - Bug] Terraform version string missing -1 suffix** +- **Found during:** Task 2 (build attempt — apt version not found) +- **Issue:** `apt-get install terraform=1.15.6` failed with "Version '1.15.6' not found" because HashiCorp apt repo uses version format `1.15.6-1` +- **Fix:** Changed install line to `terraform=${TERRAFORM_VERSION}-1` +- **Files modified:** `ngn-agent/docker/Dockerfile` +- **Verification:** Terraform 1.15.6-1 installed and runs successfully +- **Committed in:** `2797a64` (Task 2 commit) + +**4. [Rule 1 - Bug] Helm version 4.2.1 not found in Buildkite apt repo** +- **Found during:** Task 2 (build attempt — apt version not found) +- **Issue:** Helm 4.2.1 doesn't exist in the Buildkite apt repo; latest available is 4.2.0-1 +- **Fix:** Changed `HELM_VERSION` ARG from `4.2.1` to `4.2.0` +- **Files modified:** `ngn-agent/docker/Dockerfile` +- **Verification:** Helm 4.2.0-1 installed successfully, `helm version --short` reports `v4.2.0+g0646808` +- **Committed in:** `2797a64` (Task 2 commit) + +--- + +**Total deviations:** 4 auto-fixed (2 bugs, 1 blocking, 1 version correction) +**Impact on plan:** All fixes essential for build to succeed and tools to work correctly on the target architecture. No scope creep. + +## Issues Encountered + +- Base image `nikolaik/python-nodejs:python3.11-nodejs20` is still available on Docker Hub (as of 2026-06-15) — the planned deprecation did not occur, so the original tag was used +- The base image is multi-architecture (arm64 + amd64); on Apple Silicon, Docker selects the arm64 variant automatically. Binary downloads for tools without native ARM64 builds were fixed with architecture detection + +## User Setup Required + +None - no external service configuration required. Run `docker/build.sh` to rebuild the image. + +## Next Phase Readiness + +- Docker image `ngn-agent:latest` is built and verified with all 5 tools +- Ready for Plan 2 (portable setup script) which will reference this image in `~/.hermes/config.yaml` +- The base image tag `nikolaik/python-nodejs:python3.11-nodejs20` should be monitored — if it gets deprecated, update to `python3.11-nodejs22-bookworm` as documented in Dockerfile comment + +--- + +*Phase: 09-tooling-portable-setup* +*Completed: 2026-06-15*