docs(09-tooling-portable-setup-01): complete custom Docker image plan
- SUMMARY.md with deviations, decisions, and verified tool versions
This commit is contained in:
149
.planning/phases/09-tooling-portable-setup/09-01-SUMMARY.md
Normal file
149
.planning/phases/09-tooling-portable-setup/09-01-SUMMARY.md
Normal file
@@ -0,0 +1,149 @@
|
|||||||
|
---
|
||||||
|
phase: 09-tooling-portable-setup
|
||||||
|
plan: 01
|
||||||
|
subsystem: infra
|
||||||
|
tags: docker, aws-cli, terraform, helm, kubectl, datadog-cli, arm64
|
||||||
|
|
||||||
|
# Dependency graph
|
||||||
|
requires:
|
||||||
|
- phase: 09-tooling-portable-setup
|
||||||
|
provides: Research on tool versions, architecture decisions, installation methods
|
||||||
|
provides:
|
||||||
|
- Custom Hermes Docker image (ngn-agent:latest) with 5 platform engineering CLI tools
|
||||||
|
- Version-pinned, reproducible Dockerfile with architecture detection (x86_64 + arm64)
|
||||||
|
- Single-command build entry point (docker/build.sh)
|
||||||
|
affects: [09-tooling-portable-setup (plan 02 - setup script)]
|
||||||
|
|
||||||
|
# Tech tracking
|
||||||
|
tech-stack:
|
||||||
|
added:
|
||||||
|
- Dockerfile multi-tool build pattern
|
||||||
|
- Architecture-detection case/esac for binary downloads
|
||||||
|
patterns:
|
||||||
|
- Version pinning via ARGs for reproducibility
|
||||||
|
- Multi-architecture support via uname -m detection
|
||||||
|
- GPG-verified apt repos for tool installation
|
||||||
|
|
||||||
|
key-files:
|
||||||
|
created:
|
||||||
|
- ngn-agent/docker/Dockerfile
|
||||||
|
- ngn-agent/docker/build.sh
|
||||||
|
modified: []
|
||||||
|
|
||||||
|
key-decisions:
|
||||||
|
- "Helm version 4.2.1 not in Buildkite apt repo; pinned to 4.2.0 instead"
|
||||||
|
- "Terraform apt version format requires -1 suffix (terraform=1.15.6-1)"
|
||||||
|
- "Added architecture detection for AWS CLI and pup (x86_64 + aarch64) for native ARM64 support"
|
||||||
|
- "Used /etc/os-release instead of lsb_release (not available in base image)"
|
||||||
|
|
||||||
|
patterns-established:
|
||||||
|
- "Multi-tool Dockerfile: version-pinned ARGs, GPG-verified apt repos, architecture detection for binary downloads"
|
||||||
|
|
||||||
|
requirements-completed: [TOOL-01]
|
||||||
|
|
||||||
|
# Metrics
|
||||||
|
duration: 6 min
|
||||||
|
completed: 2026-06-15
|
||||||
|
---
|
||||||
|
|
||||||
|
# Phase 9 Plan 1: Custom Hermes Docker Image with Platform Engineering Tools
|
||||||
|
|
||||||
|
**Version-pinned Docker image (ngn-agent:latest) with aws-cli, terraform, helm, kubectl, and datadog CLI (pup), buildable via a single docker/build.sh command — with native ARM64 support for Apple Silicon.**
|
||||||
|
|
||||||
|
## Performance
|
||||||
|
|
||||||
|
- **Duration:** 6 min
|
||||||
|
- **Started:** 2026-06-15T15:18:47Z
|
||||||
|
- **Completed:** 2026-06-15T15:24:38Z
|
||||||
|
- **Tasks:** 2
|
||||||
|
- **Files modified:** 2
|
||||||
|
|
||||||
|
## Accomplishments
|
||||||
|
|
||||||
|
- Created `ngn-agent/docker/Dockerfile` — version-pinned installations of 5 platform engineering tools on top of `nikolaik/python-nodejs:python3.11-nodejs20`
|
||||||
|
- Created `ngn-agent/docker/build.sh` — single-command build entry point with `set -euo pipefail`
|
||||||
|
- Built and verified `ngn-agent:latest` image with all 5 tools working natively on ARM64 (Apple Silicon)
|
||||||
|
- Added architecture detection (`uname -m`) for AWS CLI and pup binary downloads supporting both x86_64 and aarch64
|
||||||
|
|
||||||
|
## Task Commits
|
||||||
|
|
||||||
|
Each task was committed atomically:
|
||||||
|
|
||||||
|
1. **Task 1: Create Dockerfile with version-pinned tool installations** - `78fd400` (feat)
|
||||||
|
2. **Task 2: Create build.sh and verify image builds successfully** - `2797a64` (feat, includes deviation fixes)
|
||||||
|
3. **Task 2 follow-up: Add D-04/D-05 references to build.sh** - `cc1da75` (docs)
|
||||||
|
|
||||||
|
**Plan metadata:** `(committed as part of Task 2 commits)`
|
||||||
|
|
||||||
|
## Files Created/Modified
|
||||||
|
|
||||||
|
- `ngn-agent/docker/Dockerfile` (112 lines) — Multi-architecture Dockerfile with 5 platform engineering tools, version-pinned via ARGs, GPG-verified apt repos, architecture detection for binary downloads
|
||||||
|
- `ngn-agent/docker/build.sh` (26 lines) — Single-command build entry point, resolves script location for correct build context (T-09-02 mitigation)
|
||||||
|
|
||||||
|
## Decisions Made
|
||||||
|
|
||||||
|
- **Helm version 4.2.0** used instead of planned 4.2.1 — 4.2.1 doesn't exist in Buildkite apt repo
|
||||||
|
- **Architecture detection** added for AWS CLI and pup — base image runs on ARM64 natively (Apple Silicon), x86_64 binaries would need QEMU emulation
|
||||||
|
- **Terraform version string** uses `-1` suffix for apt compatibility (`terraform=1.15.6-1`)
|
||||||
|
- **`/etc/os-release`** used for codename detection instead of `lsb_release` (not shipped in base image)
|
||||||
|
|
||||||
|
## Deviations from Plan
|
||||||
|
|
||||||
|
### Auto-fixed Issues
|
||||||
|
|
||||||
|
**1. [Rule 3 - Blocking] `lsb_release` not found in base image**
|
||||||
|
- **Found during:** Task 1 (Dockerfile creation, first build attempt)
|
||||||
|
- **Issue:** `lsb_release` is not installed in the base image, causing the terraform apt repo codename resolution to fail with "Malformed entry"
|
||||||
|
- **Fix:** Replaced `$(lsb_release -cs)` with `. /etc/os-release && echo ${VERSION_CODENAME}` for Debian codename detection
|
||||||
|
- **Files modified:** `ngn-agent/docker/Dockerfile`
|
||||||
|
- **Verification:** Build succeeded, codename resolved to `trixie`, terraform installed correctly
|
||||||
|
- **Committed in:** `2797a64` (Task 2 commit)
|
||||||
|
|
||||||
|
**2. [Rule 1 - Bug] Hardcoded x86_64 binary downloads fail on ARM64 (Apple Silicon)**
|
||||||
|
- **Found during:** Task 2 (build verification — tools hung via QEMU)
|
||||||
|
- **Issue:** AWS CLI and pup binaries were hardcoded to x86_64 URLs. On Apple Silicon, the base image runs natively on ARM64, and x86_64 binaries triggered QEMU emulation that hung without proper `/lib64/ld-linux-x86-64.so.2`
|
||||||
|
- **Fix:** Added architecture detection via `uname -m` with case/esac for both AWS CLI (`awscli-exe-linux-{arch}.zip`) and pup (`pup_${version}_Linux_{arch}.tar.gz`) downloads
|
||||||
|
- **Files modified:** `ngn-agent/docker/Dockerfile`
|
||||||
|
- **Verification:** All 5 tools now run natively on ARM64 (aarch64) without QEMU warnings. `aws --version` reports `exe/aarch64.debian.13`, terraform reports `linux_arm64`
|
||||||
|
- **Committed in:** `2797a64` (Task 2 commit)
|
||||||
|
|
||||||
|
**3. [Rule 1 - Bug] Terraform version string missing -1 suffix**
|
||||||
|
- **Found during:** Task 2 (build attempt — apt version not found)
|
||||||
|
- **Issue:** `apt-get install terraform=1.15.6` failed with "Version '1.15.6' not found" because HashiCorp apt repo uses version format `1.15.6-1`
|
||||||
|
- **Fix:** Changed install line to `terraform=${TERRAFORM_VERSION}-1`
|
||||||
|
- **Files modified:** `ngn-agent/docker/Dockerfile`
|
||||||
|
- **Verification:** Terraform 1.15.6-1 installed and runs successfully
|
||||||
|
- **Committed in:** `2797a64` (Task 2 commit)
|
||||||
|
|
||||||
|
**4. [Rule 1 - Bug] Helm version 4.2.1 not found in Buildkite apt repo**
|
||||||
|
- **Found during:** Task 2 (build attempt — apt version not found)
|
||||||
|
- **Issue:** Helm 4.2.1 doesn't exist in the Buildkite apt repo; latest available is 4.2.0-1
|
||||||
|
- **Fix:** Changed `HELM_VERSION` ARG from `4.2.1` to `4.2.0`
|
||||||
|
- **Files modified:** `ngn-agent/docker/Dockerfile`
|
||||||
|
- **Verification:** Helm 4.2.0-1 installed successfully, `helm version --short` reports `v4.2.0+g0646808`
|
||||||
|
- **Committed in:** `2797a64` (Task 2 commit)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
**Total deviations:** 4 auto-fixed (2 bugs, 1 blocking, 1 version correction)
|
||||||
|
**Impact on plan:** All fixes essential for build to succeed and tools to work correctly on the target architecture. No scope creep.
|
||||||
|
|
||||||
|
## Issues Encountered
|
||||||
|
|
||||||
|
- Base image `nikolaik/python-nodejs:python3.11-nodejs20` is still available on Docker Hub (as of 2026-06-15) — the planned deprecation did not occur, so the original tag was used
|
||||||
|
- The base image is multi-architecture (arm64 + amd64); on Apple Silicon, Docker selects the arm64 variant automatically. Binary downloads for tools without native ARM64 builds were fixed with architecture detection
|
||||||
|
|
||||||
|
## User Setup Required
|
||||||
|
|
||||||
|
None - no external service configuration required. Run `docker/build.sh` to rebuild the image.
|
||||||
|
|
||||||
|
## Next Phase Readiness
|
||||||
|
|
||||||
|
- Docker image `ngn-agent:latest` is built and verified with all 5 tools
|
||||||
|
- Ready for Plan 2 (portable setup script) which will reference this image in `~/.hermes/config.yaml`
|
||||||
|
- The base image tag `nikolaik/python-nodejs:python3.11-nodejs20` should be monitored — if it gets deprecated, update to `python3.11-nodejs22-bookworm` as documented in Dockerfile comment
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
*Phase: 09-tooling-portable-setup*
|
||||||
|
*Completed: 2026-06-15*
|
||||||
Reference in New Issue
Block a user