diff --git a/.github/workflows/build.yml b/.github/workflows/build.yml index 448ac57..fce88e1 100644 --- a/.github/workflows/build.yml +++ b/.github/workflows/build.yml @@ -2,9 +2,10 @@ name: Build and Push Docker Image on: push: - branches: [ "main", "master" ] + branches: ["main", "master"] pull_request: - branches: [ "main", "master" ] + branches: ["main", "master"] + workflow_dispatch: env: REGISTRY: ghcr.io diff --git a/.github/workflows/lint.yml b/.github/workflows/lint.yml index 86e3845..2681d30 100644 --- a/.github/workflows/lint.yml +++ b/.github/workflows/lint.yml @@ -20,4 +20,4 @@ jobs: - name: Run linter uses: golangci/golangci-lint-action@v8 with: - version: v2.1.0 + version: v2.7.2 diff --git a/.github/workflows/test-e2e.yml b/.github/workflows/test-e2e.yml deleted file mode 100644 index 68fd1ed..0000000 --- a/.github/workflows/test-e2e.yml +++ /dev/null @@ -1,32 +0,0 @@ -name: E2E Tests - -on: - push: - pull_request: - -jobs: - test-e2e: - name: Run on Ubuntu - runs-on: ubuntu-latest - steps: - - name: Clone the code - uses: actions/checkout@v4 - - - name: Setup Go - uses: actions/setup-go@v5 - with: - go-version-file: go.mod - - - name: Install the latest version of kind - run: | - curl -Lo ./kind https://kind.sigs.k8s.io/dl/latest/kind-linux-amd64 - chmod +x ./kind - sudo mv ./kind /usr/local/bin/kind - - - name: Verify kind installation - run: kind version - - - name: Running Test e2e - run: | - go mod tidy - make test-e2e diff --git a/.github/workflows/test.yml b/.github/workflows/test.yml index fc2e80d..0cfb3e3 100644 --- a/.github/workflows/test.yml +++ b/.github/workflows/test.yml @@ -20,4 +20,4 @@ jobs: - name: Running Tests run: | go mod tidy - make test + make test ENVTEST_K8S_VERSION=1.31 diff --git a/Makefile b/Makefile index 8b5a24c..24a323e 100644 --- a/Makefile +++ b/Makefile @@ -242,7 +242,7 @@ CONTROLLER_TOOLS_VERSION ?= v0.18.0 ENVTEST_VERSION ?= $(shell go list -m -f "{{ .Version }}" sigs.k8s.io/controller-runtime | awk -F'[v.]' '{printf "release-%d.%d", $$2, $$3}') #ENVTEST_K8S_VERSION is the version of Kubernetes to use for setting up ENVTEST binaries (i.e. 1.31) ENVTEST_K8S_VERSION ?= $(shell go list -m -f "{{ .Version }}" k8s.io/api | awk -F'[v.]' '{printf "1.%d", $$3}') -GOLANGCI_LINT_VERSION ?= v2.1.0 +GOLANGCI_LINT_VERSION ?= v2.7.2 .PHONY: kustomize kustomize: $(KUSTOMIZE) ## Download kustomize locally if necessary. diff --git a/README.md b/README.md index bb37d2d..253c3de 100644 --- a/README.md +++ b/README.md @@ -1,82 +1,187 @@ -# Overview +# Gitea Runner Operator -Operator to manage gitea Act runner on Kubernetes +A Kubernetes Operator to manage ephemeral Gitea Act runners. This operator automatically spawns runner pods based on queued jobs, support global, org/user, repo level runner. Definetely-vibe-coded (don't worry i know what i am doing). -# How it works? +## Features -1. It installs a set of CRDs: `kind: RunnerGroup` in Kubernetes +- **Ephemeral Runners**: Each job gets a fresh runner which is destroyed after execution. +- **Multiple Scopes**: Support for `global`, `org`, `user`, and `repo` level runners. +- **Auto-Scaling**: Automatically scales runners up to a configured maximum based on queued jobs. +- **Label Matching**: matches Gitea job labels (e.g., `ubuntu-latest`) to runner capabilities. + +## Prerequisites + +- **Kubernetes Cluster**: v1.23+ +- **Gitea**: v1.25.0+ (with Actions enabled) + +## Installation (Helm Chart) + +### Incoming + +## Installation (Manual) + +### 1. Deploy the Operator + +You can deploy the operator using the provided manifests. + +```bash +# Clone the repository +git clone https://github.com/bapung/gitea-runner-operator.git +cd gitea-runner-operator + +# Install CRDs +make install + +# Deploy the controller to the cluster +make deploy IMG=ghcr.io/bapung/gitea-runner-operator:latest +``` + +### 2. Create Credentials Secret + +Create a secret containing the Gitea Registration Token and an API Auth Token. + +1. **Registration Token**: Get this from Gitea Admin -> Actions -> Runners -> Create new Runner (or Org/Repo settings). +2. **Auth Token**: Generate a token in Gitea User Settings -> Applications. It needs `read:repository`, `read:user` permissions. + +```yaml +apiVersion: v1 +kind: Secret +metadata: + name: gitea-runner-secret + namespace: gitea-runner-operator-system +type: Opaque +stringData: + registrationToken: "" + authToken: "" +``` + +Apply it: + +```bash +kubectl apply -f secret.yaml +``` + +## Configuration + +The core resource is the `RunnerGroup`. Below are examples for different scopes. + +### 1. Repository Scope + +Spawns runners only for jobs in a specific repository. ```yaml apiVersion: gitea.bpg.pw/v1alpha1 kind: RunnerGroup metadata: - name: my-repo-runner-1 - namespace: gitea-runner-system + name: my-repo-runner + namespace: gitea-runner-operator-system spec: scope: repo - org: myorg # optional; ommited if scope == global - repo: myreponame # optional; ommited if scope == org || scope == global - gitea: - url: https://gitea.bpg.pw + org: myorg + repo: myrepo + giteaURL: https://gitea.example.com + maxActiveRunners: 5 labels: - - default - - app:infra - maxActiveRunners: 5 # - registrationToken: # registration token for runner + - "ubuntu-latest" + - "custom-label" + registrationToken: secretRef: - name: gitea-runner-secret-0 + name: gitea-runner-secret key: registrationToken - authToken: # token to get list of job status + authToken: secretRef: - name: gitea-runner-secret-0 + name: gitea-runner-secret key: authToken ``` -2. The RunnerGroup controller will continuously watch for queued jobs based on its scope: `global`, `org`, or `repo`. If a new workflow run is detected with `status: queued`, based on the RunnerGroup's labels, the controller will spawn a new ephemeral runner as a Job. +### 2. Organization Scope + +Spawns runners for any repository within the organization. ```yaml -apiVersion: batch/v1 -kind: Job +apiVersion: gitea.bpg.pw/v1alpha1 +kind: RunnerGroup metadata: - name: my-repo-runner-1-275f1b8f - labels: - app: my-repo-runner-1 - # tags to determine that this resource is managed by the Operator + name: my-org-runner + namespace: gitea-runner-operator-system spec: - # Optional: Automatically clean up the job after it finishes (e.g., 100 seconds) - ttlSecondsAfterFinished: 600 - template: - metadata: - labels: - app: act-my-repo-runner-1 - spec: - restartPolicy: OnFailure - securityContext: - fsGroup: 1000 - volumes: - - name: runner-data - persistentVolumeClaim: - claimName: act-runner-vol - containers: - - name: runner - image: gitea/act_runner:nightly-dind-rootless - imagePullPolicy: Always - env: - - name: DOCKER_HOST - value: tcp://localhost:2376 - - name: DOCKER_CERT_PATH - value: /certs/client - - name: DOCKER_TLS_VERIFY - value: "1" - - name: GITEA_INSTANCE_URL - value: https://gitea.bpg.pw - - name: GITEA_RUNNER_EPHEMERAL # always ephemeral - value: "1" - - name: GITEA_RUNNER_REGISTRATION_TOKEN - valueFrom: - secretKeyRef: - name: gitea-runner-secret-0 - key: registrationToken - securityContext: - privileged: true + scope: org + org: myorg + # repo is omitted + giteaURL: https://gitea.example.com + maxActiveRunners: 10 + # ... (tokens) ``` + +### 3. User Scope + +Spawns runners for any repository owned by the specified user. + +```yaml +apiVersion: gitea.bpg.pw/v1alpha1 +kind: RunnerGroup +metadata: + name: my-user-runner + namespace: gitea-runner-operator-system +spec: + scope: user + user: myusername + # org and repo are omitted + giteaURL: https://gitea.example.com + maxActiveRunners: 3 + # ... (tokens) +``` + +### 4. Global Scope + +Spawns runners for any job in the Gitea instance (Admin level). + +```yaml +apiVersion: gitea.bpg.pw/v1alpha1 +kind: RunnerGroup +metadata: + name: global-runner + namespace: gitea-runner-operator-system +spec: + scope: global + # org, user, and repo are omitted + giteaURL: https://gitea.example.com + maxActiveRunners: 20 + # ... (tokens) +``` + +## How it works + +1. The **Controller** polls the Gitea API (using the `authToken`) to check for queued jobs matching the scope and labels. +2. If a matching queued job is found, and the current active runner count is below `maxActiveRunners`, the Controller creates a Kubernetes `Job`. +3. The `Job` pod starts an `act_runner` instance, registers itself using the `registrationToken` (as ephemeral), picks up the job, executes it, and then terminates. + +## Troubleshooting + +### Runners are not starting + +1. **Check Controller Logs**: + + ```bash + kubectl logs -n gitea-runner-operator-system -l control-plane=controller-manager -f + ``` + + Look for errors regarding API authentication or connectivity. + +2. **Check Permissions**: + Ensure the `authToken` has sufficient permissions (`read:repository`, etc.) to query actions. + +3. **Check Labels**: + Enable debug logging in the controller to see label matching logic. If your Gitea job requires `ubuntu-latest` but your RunnerGroup defines `centos`, it won't match. + +### Docker Daemon Issues + +This is a default rootless Job template from Gitea doc, it has issues with docker daemon. I still can't to get it working with `docker` command, other container works just fine if you put correct labels. +Per Gemini: +The default runner image uses `dind-rootless`. This requires the pod to run with `privileged: true`. Ensure your cluster policies (PSP/PSA) allow privileged pods in the operator namespace. + +## Roadmap / Wishlist + +- Helm Chart +- Custom Runner Job Spec definition +- Push mode using Webhook trigger diff --git a/api/v1alpha1/runnergroup_types.go b/api/v1alpha1/runnergroup_types.go index 656c572..3e4b52f 100644 --- a/api/v1alpha1/runnergroup_types.go +++ b/api/v1alpha1/runnergroup_types.go @@ -32,14 +32,16 @@ const ( RunnerGroupScopeGlobal RunnerGroupScope = "global" // RunnerGroupScopeOrg means the runner group is scoped to an organization RunnerGroupScopeOrg RunnerGroupScope = "org" + // RunnerGroupScopeUser means the runner group is scoped to a user + RunnerGroupScopeUser RunnerGroupScope = "user" // RunnerGroupScopeRepo means the runner group is scoped to a repository RunnerGroupScopeRepo RunnerGroupScope = "repo" ) // RunnerGroupSpec defines the desired state of RunnerGroup. type RunnerGroupSpec struct { - // Scope defines the scope of the runner (global, org, repo) - // +kubebuilder:validation:Enum=global;org;repo + // Scope defines the scope of the runner (global, org, user, repo) + // +kubebuilder:validation:Enum=global;org;user;repo // +kubebuilder:validation:Required Scope RunnerGroupScope `json:"scope"` @@ -47,6 +49,10 @@ type RunnerGroupSpec struct { // +optional Org string `json:"org,omitempty"` + // User is required if scope is 'user' + // +optional + User string `json:"user,omitempty"` + // Repo is required if scope is 'repo' // +optional Repo string `json:"repo,omitempty"` diff --git a/config/crd/bases/gitea.bpg.pw_runnergroups.yaml b/config/crd/bases/gitea.bpg.pw_runnergroups.yaml index 99a2365..c08fe72 100644 --- a/config/crd/bases/gitea.bpg.pw_runnergroups.yaml +++ b/config/crd/bases/gitea.bpg.pw_runnergroups.yaml @@ -107,12 +107,17 @@ spec: description: Repo is required if scope is 'repo' type: string scope: - description: Scope defines the scope of the runner (global, org, repo) + description: Scope defines the scope of the runner (global, org, user, + repo) enum: - global - org + - user - repo type: string + user: + description: User is required if scope is 'user' + type: string required: - authToken - giteaURL diff --git a/config/manager/image_pull_secret_patch.yaml b/config/manager/image_pull_secret_patch.yaml new file mode 100644 index 0000000..ed24a29 --- /dev/null +++ b/config/manager/image_pull_secret_patch.yaml @@ -0,0 +1,10 @@ +apiVersion: apps/v1 +kind: Deployment +metadata: + name: controller-manager + namespace: system +spec: + template: + spec: + imagePullSecrets: + - name: ghcr-secret diff --git a/config/manager/kustomization.yaml b/config/manager/kustomization.yaml index 5c5f0b8..3239de4 100644 --- a/config/manager/kustomization.yaml +++ b/config/manager/kustomization.yaml @@ -1,2 +1,11 @@ resources: - manager.yaml +apiVersion: kustomize.config.k8s.io/v1beta1 +kind: Kustomization +images: +- name: controller + newName: ghcr.io/bapung/gitea-runner-operator + newTag: sha-b33c78b + +patchesStrategicMerge: +- image_pull_secret_patch.yaml diff --git a/config/rbac/role_binding.yaml b/config/rbac/role_binding.yaml index 33cbca7..976527b 100644 --- a/config/rbac/role_binding.yaml +++ b/config/rbac/role_binding.yaml @@ -10,6 +10,6 @@ roleRef: kind: ClusterRole name: manager-role subjects: -- kind: ServiceAccount - name: controller-manager - namespace: system + - kind: ServiceAccount + name: controller-manager + namespace: gitea-runner-operator-system diff --git a/config/samples/gitea_v1alpha1_runnergroup.yaml b/config/samples/gitea_v1alpha1_runnergroup.yaml index 6085a90..f72d487 100644 --- a/config/samples/gitea_v1alpha1_runnergroup.yaml +++ b/config/samples/gitea_v1alpha1_runnergroup.yaml @@ -1,3 +1,16 @@ +apiVersion: v1 +kind: Secret +metadata: + name: gitea-credentials + labels: + app.kubernetes.io/name: gitea-runner-operator + app.kubernetes.io/managed-by: kustomize +stringData: + # The Gitea API Token (for the Operator to poll for jobs) + auth-token: "MMUCFRXCbofYn2L0aT2OP2aug7JhChNJlULKNLgg" + # The Runner Registration Token (for the Runner to register itself) + registration-token: "5r4lpLA9rKCZZEHyUyKHeA187DoaElcTBySITRRi" +--- apiVersion: gitea.bpg.pw/v1alpha1 kind: RunnerGroup metadata: @@ -6,4 +19,29 @@ metadata: app.kubernetes.io/managed-by: kustomize name: runnergroup-sample spec: - # TODO(user): Add fields here + # The base URL of your Gitea instance + giteaURL: "https://gitea.bpg.pw" + + # Scope of the runners (global, org, or repo) + scope: "org" + #org: "bapungorg" # Required if scope is 'org' or 'repo'; cannot be used with user + user: "bapung" # Required if scope is 'user' or 'repo'; cannot be used with org + #repo: "dummy-service-workflow" # Required if scope is 'repo' + + # Labels to identify this runner group + labels: + - "linux" + - "amd64" + + # Maximum number of runners to spawn concurrently + maxActiveRunners: 5 + + # Reference to the Secret containing the API token + authToken: + name: gitea-credentials + key: auth-token + + # Reference to the Secret containing the Registration token + registrationToken: + name: gitea-credentials + key: registration-token diff --git a/implementation.md b/implementation.md index 9ef2d52..598ace2 100644 --- a/implementation.md +++ b/implementation.md @@ -30,18 +30,23 @@ type RunnerGroupScope string const ( RunnerGroupScopeGlobal RunnerGroupScope = "global" RunnerGroupScopeOrg RunnerGroupScope = "org" + RunnerGroupScopeUser RunnerGroupScope = "user" RunnerGroupScopeRepo RunnerGroupScope = "repo" ) type RunnerGroupSpec struct { - // Scope defines the scope of the runner (global, org, repo) - // +kubebuilder:validation:Enum=global;org;repo + // Scope defines the scope of the runner (global, org, user, repo) + // +kubebuilder:validation:Enum=global;org;user;repo Scope RunnerScope `json:"scope"` // Org is required if scope is 'org' // +optional Org string `json:"org,omitempty"` + // User is required if scope is 'user' + // +optional + User string `json:"user,omitempty"` + // Repo is required if scope is 'repo' // +optional Repo string `json:"repo,omitempty"` @@ -49,7 +54,8 @@ type RunnerGroupSpec struct { // GiteaURL is the base URL of the Gitea instance GiteaURL string `json:"giteaURL"` - // Labels to assign to the runner + // Labels to assign to the runner. + // Defaults (e.g. ubuntu-latest) are merged automatically by the controller. // +optional Labels []string `json:"labels,omitempty"` @@ -79,154 +85,103 @@ type RunnerGroupStatus struct { ## 4. Controller Implementation (`internal/controller/runnergroup_controller.go`) -The controller handles the reconciliation loop. +The controller handles the reconciliation loop and manages the lifecycle of ephemeral runners. -### 4.1 RBAC Permissions +### 4.1 Struct Definition -Add markers to generate RBAC roles: +The reconciler includes a thread-safe map to cache spawned jobs and prevent duplicate scheduling. ```go -// +kubebuilder:rbac:groups=gitea.bpg.pw,resources=runnergroups,verbs=get;list;watch;create;update;patch;delete -// +kubebuilder:rbac:groups=gitea.bpg.pw,resources=runnergroups/status,verbs=get;update;patch -// +kubebuilder:rbac:groups=batch,resources=jobs,verbs=get;list;watch;create;update;patch;delete -// +kubebuilder:rbac:groups="",resources=secrets,verbs=get;list;watch +type RunnerGroupReconciler struct { + client.Client + Scheme *runtime.Scheme + GiteaClient gitea.Client + SpawnedJobsCache sync.Map // Stores [int64]time.Time (JobID -> SpawnTime) +} ``` ### 4.2 Reconcile Logic -The `Reconcile` function should follow this flow: +The `Reconcile` function follows this flow: -1. **Fetch RunnerGroup**: Get the `RunnerGroup` CR instance. If not found, ignore (deleted). -2. **List Jobs**: List all `batchv1.Job` resources in the same namespace that are owned by this RunnerGroup. - - Filter by label `gitea.bpg.pw/runnergroup-name=`. -3. **Update Status**: Update `status.activeRunners` with the count of non-completed jobs. -4. **Capacity Check**: - - If `activeRunners >= spec.maxActiveRunners`, stop and requeue. -5. **Poll Gitea**: - - Retrieve the Auth Token from the Secret referenced in `spec.authToken`. - - Instantiate a Gitea API Client. - - Query for queued workflow runs matching the scope and labels. -6. **Scale Up**: - - Calculate `needed = count(queued_jobs)`. - - Calculate `available_slots = spec.maxActiveRunners - activeRunners`. - - `to_spawn = min(needed, available_slots)`. - - Loop `to_spawn` times: - - Create a new `batchv1.Job`. -7. **Requeue**: Return `ctrl.Result{RequeueAfter: 10 * time.Second}` to ensure continuous polling. +1. **Fetch RunnerGroup**: Get the `RunnerGroup` CR instance. +2. **List Jobs**: List all `batchv1.Job` resources owned by this CR to calculate `activeRunners`. +3. **Update Status**: Update `status.activeRunners`. +4. **Capacity Check**: Stop scaling if `activeRunners >= spec.maxActiveRunners`. +5. **Label Calculation**: Call `getEffectiveLabels` to merge `spec.labels` with hardcoded Gitea defaults (e.g., `ubuntu-latest:docker://node:16-bullseye`). +6. **Poll Gitea**: + - Retrieve Auth Token. + - Call `GiteaClient.GetRunnerStats` with the effective labels. + - This returns a list of `QueuedJobs`. +7. **Scale Up & Deduplication**: + - Iterate through `stats.QueuedJobs`. + - **Check Cache**: If Job ID exists in `SpawnedJobsCache`: + - If TTL (< 5 min) is valid: **Skip** (already handled). + - If TTL expired: **Retry** (assume previous runner failed). + - If Job ID not in cache or expired: + - Check `availableSlots`. + - Retrieve Registration Token (if not yet fetched). + - **Spawn Job**: Create `batchv1.Job`. + - **Update Cache**: Store Job ID in `SpawnedJobsCache`. + - Decrement `availableSlots`. +8. **Cache Cleanup**: Remove IDs from `SpawnedJobsCache` if they are not present in the latest `QueuedJobs` list from Gitea. +9. **Requeue**: Return `ctrl.Result{RequeueAfter: 10 * time.Second}`. -### 4.3 Job Construction +### 4.3 Helper Functions -Helper function to create the Job object: +#### getEffectiveLabels -```go -func (r *RunnerGroupReconciler) constructJobForRunnerGroup(runnerGroup *giteav1alpha1.RunnerGroup, registrationToken string) (*batchv1.Job, error) { - // Generate random suffix for name - name := fmt.Sprintf("%s-%s", runnerGroup.Name, randString(5)) +Merges user-defined labels with Gitea defaults. If a user defines `ubuntu-latest`, it overrides the default `ubuntu-latest:docker://...`. - // Construct Env Vars - envVars := []corev1.EnvVar{ - {Name: "GITEA_INSTANCE_URL", Value: runnerGroup.Spec.GiteaURL}, - {Name: "GITEA_RUNNER_REGISTRATION_TOKEN", Value: registrationToken}, - {Name: "GITEA_RUNNER_EPHEMERAL", Value: "true"}, - {Name: "DOCKER_HOST", Value: "tcp://localhost:2376"}, - // ... other envs from README - } +#### constructJobForRunnerGroup - if len(runnerGroup.Spec.Labels) > 0 { - labelsStr := strings.Join(runnerGroup.Spec.Labels, ",") - envVars = append(envVars, corev1.EnvVar{Name: "GITEA_RUNNER_LABELS", Value: labelsStr}) - } +Creates the Job object with: - // Construct Job - job := &batchv1.Job{ - ObjectMeta: metav1.ObjectMeta{ - Name: name, - Namespace: runnerGroup.Namespace, - Labels: map[string]string{ - "app": runnerGroup.Name, - "gitea.bpg.pw/runnergroup-name": runnerGroup.Name, - "gitea.bpg.pw/managed-by": "gitea-runner-operator", - }, - }, - Spec: batchv1.JobSpec{ - TTLSecondsAfterFinished: pointer.Int32(600), - Template: corev1.PodTemplateSpec{ - Spec: corev1.PodSpec{ - RestartPolicy: corev1.RestartPolicyOnFailure, - Containers: []corev1.Container{ - { - Name: "runner", - Image: "gitea/act_runner:nightly-dind-rootless", - ImagePullPolicy: corev1.PullAlways, - SecurityContext: &corev1.SecurityContext{Privileged: pointer.Bool(true)}, - Env: envVars, - VolumeMounts: []corev1.VolumeMount{ - {Name: "runner-data", MountPath: "/data"}, - }, - }, - }, - Volumes: []corev1.Volume{ - { - Name: "runner-data", - VolumeSource: corev1.VolumeSource{ - PersistentVolumeClaim: &corev1.PersistentVolumeClaimVolumeSource{ - ClaimName: "act-runner-vol", // Note: Consider making this configurable or EmptyDir - }, - }, - }, - }, - }, - }, - }, - } - - // Set Controller Reference - if err := ctrl.SetControllerReference(runnerGroup, job, r.Scheme); err != nil { - return nil, err - } - - return job, nil -} -``` +- **Name**: `{runnergroup-name}-{random-suffix}` +- **Env**: + - `GITEA_RUNNER_NAME`: Set to the Job name. + - `GITEA_RUNNER_LABELS`: Comma-separated effective labels. + - Standard runner envs (`GITEA_INSTANCE_URL`, etc). ## 5. Gitea Client (`internal/gitea/client.go`) -A simple HTTP client wrapper to interact with Gitea. +A specialized client to interact with Gitea's Actions API. ### 5.1 Interface ```go +type RunnerStats struct { + QueuedJobs []ActionWorkflowJob + Running int +} + type Client interface { - GetQueuedRuns(ctx context.Context, scope RunnerGroupScope, owner, repo string, labels []string) (int, error) + GetRunnerStats(ctx context.Context, giteaURL, authToken string, scope RunnerGroupScope, org, repo string, labels []string) (*RunnerStats, error) } ``` -### 5.2 Implementation Details +### 5.2 Logic -- **Endpoint**: `/api/v1/repos/{owner}/{repo}/actions/runs` -- **Query Params**: `status=queued` -- **Filtering**: - - The API might return all queued runs. - - The client must filter these runs locally to ensure they match the `labels` defined in the RunnerGroup CR. - - _Note_: Gitea API might not support filtering by labels directly in the list endpoint, so client-side filtering is necessary. +1. **Endpoints**: + - Repo/Org/Global: Uses `/actions/jobs` endpoints. + - User: Fetches repos via `/users/{user}/repos`, then queries `/actions/jobs` for each repo. +2. **Fetching**: + - Fetches jobs with `status=queued`, `waiting`, `pending`. + - Handles pagination (fetches all pages). +3. **Filtering**: + - Iterates through fetched jobs. + - **Matches Labels**: Checks if the job's required labels are a subset of the runner's supported labels (effective labels). + - Supports exact match (`linux` == `linux`) + - Supports schema match (`ubuntu-latest` matches `ubuntu-latest:docker://...`) + - Returns only matching jobs in `QueuedJobs`. -## 6. Configuration & Deployment +## 6. Testing Strategy -### 6.1 Dockerfile - -Standard Operator SDK Dockerfile. Ensure the base image is minimal (e.g., `gcr.io/distroless/static:nonroot`). - -### 6.2 Kustomize - -Update `config/default/kustomization.yaml` to include the CRD and RBAC configurations. - -## 7. Testing Strategy - -1. **Unit Tests**: - - Test `constructJobForRunnerGroup` to ensure Env vars and Labels are set correctly. - - Test Gitea Client response parsing. -2. **Integration Tests (EnvTest)**: - - Spin up a local k8s control plane. - - Create a `RunnerGroup` CR. - - Verify the controller creates a `Job` when the mocked Gitea client returns queued jobs. - - Verify the controller respects `MaxActiveRunners`. +1. **Unit Tests (`internal/gitea/client_test.go`)**: + - Mock Gitea API server. + - Verify `GetRunnerStats` correctly parses JSON and handles pagination. + - Verify label matching logic (subset, schema matching). +2. **Controller Tests**: + - Verify `SpawnedJobsCache` prevents double scheduling. + - Verify TTL logic allows retries for stuck jobs. + - Verify `getEffectiveLabels` merging logic. diff --git a/internal/controller/runnergroup_controller.go b/internal/controller/runnergroup_controller.go index be4f374..55b50a7 100644 --- a/internal/controller/runnergroup_controller.go +++ b/internal/controller/runnergroup_controller.go @@ -21,6 +21,7 @@ import ( "fmt" "math/rand" "strings" + "sync" "time" batchv1 "k8s.io/api/batch/v1" @@ -40,8 +41,9 @@ import ( // RunnerGroupReconciler reconciles a RunnerGroup object type RunnerGroupReconciler struct { client.Client - Scheme *runtime.Scheme - GiteaClient gitea.Client + Scheme *runtime.Scheme + GiteaClient gitea.Client + SpawnedJobsCache sync.Map } // +kubebuilder:rbac:groups=gitea.bpg.pw,resources=runnergroups,verbs=get;list;watch;create;update;patch;delete @@ -117,56 +119,93 @@ func (r *RunnerGroupReconciler) Reconcile(ctx context.Context, req ctrl.Request) logger.Info("Checking Gitea for queued jobs", "url", runnerGroup.Spec.GiteaURL, "scope", runnerGroup.Spec.Scope) + // Calculate effective labels (spec labels + defaults) + effectiveLabels := r.getEffectiveLabels(runnerGroup.Spec.Labels) + // Query for queued workflow runs - queuedJobs, err := r.GiteaClient.GetQueuedRuns( + stats, err := r.GiteaClient.GetRunnerStats( ctx, runnerGroup.Spec.GiteaURL, authToken, runnerGroup.Spec.Scope, runnerGroup.Spec.Org, + runnerGroup.Spec.User, runnerGroup.Spec.Repo, - runnerGroup.Spec.Labels, + effectiveLabels, ) if err != nil { - logger.Error(err, "Failed to query Gitea for queued runs") + logger.Error(err, "Failed to query Gitea for runner stats") return ctrl.Result{RequeueAfter: 10 * time.Second}, err } - logger.Info("Gitea query result", "queuedJobs", queuedJobs) + logger.Info("Gitea query result", "queuedJobs", len(stats.QueuedJobs)) - // 6. Scale Up + // 6. Scale Up and Cache Management availableSlots := runnerGroup.Spec.MaxActiveRunners - activeRunners - toSpawn := min(queuedJobs, availableSlots) - if toSpawn > 0 { - logger.Info("Spawning runners", - "queuedJobs", queuedJobs, - "availableSlots", availableSlots, - "toSpawn", toSpawn) + // Track current queued IDs for cache cleanup + currentQueuedIDs := make(map[int64]bool) - // Retrieve Registration Token from Secret - registrationToken, err := r.getSecretValue(ctx, runnerGroup.Namespace, runnerGroup.Spec.RegistrationTokenRef) + // Retrieve Registration Token from Secret (only if we need to spawn) + var registrationToken string + tokenFetched := false + + for _, giteaJob := range stats.QueuedJobs { + currentQueuedIDs[giteaJob.ID] = true + + if availableSlots <= 0 { + continue + } + + // Check if we already spawned a runner for this job + if value, loaded := r.SpawnedJobsCache.Load(giteaJob.ID); loaded { + spawnTime := value.(time.Time) + if time.Since(spawnTime) < 5*time.Minute { + // Already handling this job recently + continue + } + // TTL expired (runner likely failed to start), retry spawning + logger.Info("Job stuck in queue for too long, retrying runner spawn", "giteaJobID", giteaJob.ID) + } + + // Need to spawn a runner + if !tokenFetched { + registrationToken, err = r.getSecretValue(ctx, runnerGroup.Namespace, runnerGroup.Spec.RegistrationTokenRef) + if err != nil { + logger.Error(err, "Failed to get registration token from secret") + return ctrl.Result{}, err + } + tokenFetched = true + } + + job, err := r.constructJobForRunnerGroup(runnerGroup, registrationToken, effectiveLabels) if err != nil { - logger.Error(err, "Failed to get registration token from secret") + logger.Error(err, "Failed to construct Job") return ctrl.Result{}, err } - // Spawn jobs - for i := 0; i < toSpawn; i++ { - job, err := r.constructJobForRunnerGroup(runnerGroup, registrationToken) - if err != nil { - logger.Error(err, "Failed to construct Job") - return ctrl.Result{}, err - } - - if err := r.Create(ctx, job); err != nil { - logger.Error(err, "Failed to create Job", "jobName", job.Name) - return ctrl.Result{}, err - } - logger.Info("Created Job", "jobName", job.Name) + if err := r.Create(ctx, job); err != nil { + logger.Error(err, "Failed to create Job", "jobName", job.Name) + return ctrl.Result{}, err } + + logger.Info("Created Job for Gitea Run", "jobName", job.Name, "giteaJobID", giteaJob.ID) + + // Mark as spawned + r.SpawnedJobsCache.Store(giteaJob.ID, time.Now()) + availableSlots-- } + // Cleanup cache: remove jobs that are no longer queued in Gitea + r.SpawnedJobsCache.Range(func(key, value any) bool { + jobID := key.(int64) + if !currentQueuedIDs[jobID] { + // Job is no longer in the queue (running, completed, or cancelled) + r.SpawnedJobsCache.Delete(key) + } + return true + }) + // 7. Requeue for continuous polling return ctrl.Result{RequeueAfter: 10 * time.Second}, nil } @@ -191,8 +230,43 @@ func (r *RunnerGroupReconciler) getSecretValue(ctx context.Context, namespace st return string(value), nil } +// getEffectiveLabels merges spec labels with default labels +func (r *RunnerGroupReconciler) getEffectiveLabels(specLabels []string) []string { + defaultLabels := []string{ + "ubuntu-latest:docker://node:16-bullseye", + "ubuntu-22.04:docker://node:16-bullseye", + "ubuntu-20.04:docker://node:16-bullseye", + "ubuntu-18.04:docker://node:16-buster", + } + + effectiveLabels := make([]string, len(specLabels)) + copy(effectiveLabels, specLabels) + + for _, defaultLabel := range defaultLabels { + // Check if this default label key is already overridden in specLabels + // defaultLabel format is "key:schema" + parts := strings.SplitN(defaultLabel, ":", 2) + key := parts[0] + + found := false + for _, specLabel := range specLabels { + // Spec label can be "key" or "key:schema" + if specLabel == key || strings.HasPrefix(specLabel, key+":") { + found = true + break + } + } + + if !found { + effectiveLabels = append(effectiveLabels, defaultLabel) + } + } + + return effectiveLabels +} + // constructJobForRunnerGroup creates a Job object for the RunnerGroup -func (r *RunnerGroupReconciler) constructJobForRunnerGroup(runnerGroup *giteav1alpha1.RunnerGroup, registrationToken string) (*batchv1.Job, error) { +func (r *RunnerGroupReconciler) constructJobForRunnerGroup(runnerGroup *giteav1alpha1.RunnerGroup, registrationToken string, labels []string) (*batchv1.Job, error) { // Generate random suffix for name name := fmt.Sprintf("%s-%s", runnerGroup.Name, randString(8)) @@ -201,13 +275,14 @@ func (r *RunnerGroupReconciler) constructJobForRunnerGroup(runnerGroup *giteav1a {Name: "GITEA_INSTANCE_URL", Value: runnerGroup.Spec.GiteaURL}, {Name: "GITEA_RUNNER_REGISTRATION_TOKEN", Value: registrationToken}, {Name: "GITEA_RUNNER_EPHEMERAL", Value: "true"}, + {Name: "GITEA_RUNNER_NAME", Value: name}, {Name: "DOCKER_HOST", Value: "tcp://localhost:2376"}, {Name: "DOCKER_CERT_PATH", Value: "/certs/client"}, {Name: "DOCKER_TLS_VERIFY", Value: "1"}, } - if len(runnerGroup.Spec.Labels) > 0 { - labelsStr := strings.Join(runnerGroup.Spec.Labels, ",") + if len(labels) > 0 { + labelsStr := strings.Join(labels, ",") envVars = append(envVars, corev1.EnvVar{Name: "GITEA_RUNNER_LABELS", Value: labelsStr}) } @@ -276,14 +351,6 @@ func randString(length int) string { return string(b) } -// min returns the minimum of two integers -func min(a, b int) int { - if a < b { - return a - } - return b -} - // SetupWithManager sets up the controller with the Manager. func (r *RunnerGroupReconciler) SetupWithManager(mgr ctrl.Manager) error { return ctrl.NewControllerManagedBy(mgr). diff --git a/internal/controller/runnergroup_controller_test.go b/internal/controller/runnergroup_controller_test.go index 4509860..4fa0531 100644 --- a/internal/controller/runnergroup_controller_test.go +++ b/internal/controller/runnergroup_controller_test.go @@ -25,11 +25,19 @@ import ( "k8s.io/apimachinery/pkg/types" "sigs.k8s.io/controller-runtime/pkg/reconcile" + corev1 "k8s.io/api/core/v1" metav1 "k8s.io/apimachinery/pkg/apis/meta/v1" giteav1alpha1 "github.com/bapung/gitea-runner-operator/api/v1alpha1" + "github.com/bapung/gitea-runner-operator/internal/gitea" ) +type fakeGiteaClient struct{} + +func (c *fakeGiteaClient) GetRunnerStats(ctx context.Context, giteaURL, authToken string, scope giteav1alpha1.RunnerGroupScope, org string, user string, repo string, labels []string) (*gitea.RunnerStats, error) { + return &gitea.RunnerStats{QueuedJobs: []gitea.ActionWorkflowJob{}}, nil +} + var _ = Describe("RunnerGroup Controller", func() { Context("When reconciling a resource", func() { const resourceName = "test-resource" @@ -43,6 +51,21 @@ var _ = Describe("RunnerGroup Controller", func() { runnergroup := &giteav1alpha1.RunnerGroup{} BeforeEach(func() { + By("creating the secret") + secret := &corev1.Secret{ + ObjectMeta: metav1.ObjectMeta{ + Name: "gitea-secret", + Namespace: "default", + }, + Data: map[string][]byte{ + "token": []byte("dummy"), + "auth": []byte("dummy"), + }, + } + if err := k8sClient.Create(ctx, secret); err != nil && !errors.IsAlreadyExists(err) { + Expect(err).To(Succeed()) + } + By("creating the custom resource for the Kind RunnerGroup") err := k8sClient.Get(ctx, typeNamespacedName, runnergroup) if err != nil && errors.IsNotFound(err) { @@ -51,7 +74,19 @@ var _ = Describe("RunnerGroup Controller", func() { Name: resourceName, Namespace: "default", }, - // TODO(user): Specify other spec details if needed. + Spec: giteav1alpha1.RunnerGroupSpec{ + Scope: giteav1alpha1.RunnerGroupScopeGlobal, + GiteaURL: "https://gitea.example.com", + MaxActiveRunners: 1, + RegistrationTokenRef: corev1.SecretKeySelector{ + LocalObjectReference: corev1.LocalObjectReference{Name: "gitea-secret"}, + Key: "token", + }, + AuthTokenRef: corev1.SecretKeySelector{ + LocalObjectReference: corev1.LocalObjectReference{Name: "gitea-secret"}, + Key: "auth", + }, + }, } Expect(k8sClient.Create(ctx, resource)).To(Succeed()) } @@ -69,8 +104,9 @@ var _ = Describe("RunnerGroup Controller", func() { It("should successfully reconcile the resource", func() { By("Reconciling the created resource") controllerReconciler := &RunnerGroupReconciler{ - Client: k8sClient, - Scheme: k8sClient.Scheme(), + Client: k8sClient, + Scheme: k8sClient.Scheme(), + GiteaClient: &fakeGiteaClient{}, } _, err := controllerReconciler.Reconcile(ctx, reconcile.Request{ diff --git a/internal/gitea/client.go b/internal/gitea/client.go index 882b51e..badadc4 100644 --- a/internal/gitea/client.go +++ b/internal/gitea/client.go @@ -31,17 +31,22 @@ import ( // Client defines the interface for interacting with Gitea API type Client interface { - // GetQueuedRuns queries Gitea for queued workflow runs matching the scope and labels - // Returns the count of queued jobs that match the criteria - GetQueuedRuns( + // GetRunnerStats queries Gitea for queued workflow runs matching the scope and labels + GetRunnerStats( ctx context.Context, giteaURL string, authToken string, scope v1alpha1.RunnerGroupScope, org string, + user string, repo string, labels []string, - ) (int, error) + ) (*RunnerStats, error) +} + +// RunnerStats contains lists of jobs in different states +type RunnerStats struct { + QueuedJobs []ActionWorkflowJob } // HTTPClient is the default implementation of the Gitea Client interface @@ -107,153 +112,163 @@ type ActionWorkflowJob struct { RunnerName string `json:"runner_name"` } -// GetQueuedRuns implements the Client interface -func (c *HTTPClient) GetQueuedRuns( +// GetRunnerStats implements the Client interface +func (c *HTTPClient) GetRunnerStats( ctx context.Context, giteaURL string, authToken string, scope v1alpha1.RunnerGroupScope, org string, + user string, repo string, labels []string, -) (int, error) { +) (*RunnerStats, error) { switch scope { case v1alpha1.RunnerGroupScopeRepo: - return c.getQueuedRunsForRepo(ctx, giteaURL, authToken, org, repo, labels) + return c.getRunnerStatsForRepo(ctx, giteaURL, authToken, org, repo, labels) case v1alpha1.RunnerGroupScopeOrg: - return c.getQueuedRunsForOrg(ctx, giteaURL, authToken, org, labels) + return c.getRunnerStatsForOrg(ctx, giteaURL, authToken, org, labels) + case v1alpha1.RunnerGroupScopeUser: + return c.getRunnerStatsForUser(ctx, giteaURL, authToken, user, labels) case v1alpha1.RunnerGroupScopeGlobal: - return c.getQueuedRunsGlobal(ctx, giteaURL, authToken, labels) + return c.getRunnerStatsGlobal(ctx, giteaURL, authToken, labels) default: - return 0, fmt.Errorf("unknown scope: %s", scope) + return nil, fmt.Errorf("unknown scope: %s", scope) } } -// getQueuedRunsForRepo fetches queued runs for a specific repository -func (c *HTTPClient) getQueuedRunsForRepo(ctx context.Context, giteaURL, authToken, owner, repo string, labels []string) (int, error) { - // Use jobs endpoint since it contains the runner labels we need for filtering +// getRunnerStatsForRepo fetches queued runs for a specific repository +func (c *HTTPClient) getRunnerStatsForRepo(ctx context.Context, giteaURL, authToken, owner, repo string, labels []string) (*RunnerStats, error) { endpoint := fmt.Sprintf("%s/api/v1/repos/%s/%s/actions/jobs", strings.TrimSuffix(giteaURL, "/"), owner, repo) - return c.fetchWorkflowJobs(ctx, endpoint, authToken, labels) + return c.fetchRunnerStats(ctx, endpoint, authToken, labels) } -// getQueuedRunsForOrg fetches queued runs for all repos under an organization -func (c *HTTPClient) getQueuedRunsForOrg(ctx context.Context, giteaURL, authToken, org string, labels []string) (int, error) { - // Use direct org-level jobs endpoint for better performance +// getRunnerStatsForOrg fetches queued runs for all repos under an organization +func (c *HTTPClient) getRunnerStatsForOrg(ctx context.Context, giteaURL, authToken, org string, labels []string) (*RunnerStats, error) { endpoint := fmt.Sprintf("%s/api/v1/orgs/%s/actions/jobs", strings.TrimSuffix(giteaURL, "/"), org) - return c.fetchWorkflowJobs(ctx, endpoint, authToken, labels) + return c.fetchRunnerStats(ctx, endpoint, authToken, labels) } -// getQueuedRunsGlobal fetches queued runs using admin-level API for global scope -func (c *HTTPClient) getQueuedRunsGlobal(ctx context.Context, giteaURL, authToken string, labels []string) (int, error) { - // Use admin-level jobs endpoint which provides global view of all queued jobs +// getRunnerStatsForUser fetches queued runs for all repos owned by a user +func (c *HTTPClient) getRunnerStatsForUser(ctx context.Context, giteaURL, authToken, user string, labels []string) (*RunnerStats, error) { + repos, err := c.fetchReposForUser(ctx, giteaURL, authToken, user) + if err != nil { + return nil, err + } + + var allQueuedJobs []ActionWorkflowJob + for _, repo := range repos { + endpoint := fmt.Sprintf("%s/api/v1/repos/%s/%s/actions/jobs", strings.TrimSuffix(giteaURL, "/"), repo.Owner.Login, repo.Name) + stats, err := c.fetchRunnerStats(ctx, endpoint, authToken, labels) + if err != nil { + return nil, err + } + allQueuedJobs = append(allQueuedJobs, stats.QueuedJobs...) + } + + return &RunnerStats{ + QueuedJobs: allQueuedJobs, + }, nil +} + +// getRunnerStatsGlobal fetches queued runs using admin-level API for global scope +func (c *HTTPClient) getRunnerStatsGlobal(ctx context.Context, giteaURL, authToken string, labels []string) (*RunnerStats, error) { endpoint := fmt.Sprintf("%s/api/v1/admin/actions/jobs", strings.TrimSuffix(giteaURL, "/")) - return c.fetchWorkflowJobs(ctx, endpoint, authToken, labels) + return c.fetchRunnerStats(ctx, endpoint, authToken, labels) +} + +func (c *HTTPClient) fetchRunnerStats(ctx context.Context, endpoint, authToken string, labels []string) (*RunnerStats, error) { + queuedJobs, err := c.fetchWorkflowJobs(ctx, endpoint, authToken, labels, []string{"queued", "waiting", "pending"}) + if err != nil { + return nil, err + } + + return &RunnerStats{ + QueuedJobs: queuedJobs, + }, nil } // fetchWorkflowJobs fetches workflow jobs from a given endpoint with label filtering and pagination -func (c *HTTPClient) fetchWorkflowJobs(ctx context.Context, endpoint, authToken string, labels []string) (int, error) { - totalCount := 0 - page := 1 - limit := 50 // Default page size +func (c *HTTPClient) fetchWorkflowJobs(ctx context.Context, endpoint, authToken string, labels []string, statuses []string) ([]ActionWorkflowJob, error) { + var allJobs []ActionWorkflowJob - for { - u, err := url.Parse(endpoint) - if err != nil { - return 0, err - } - q := u.Query() - q.Set("status", "queued") - q.Set("page", fmt.Sprintf("%d", page)) - q.Set("limit", fmt.Sprintf("%d", limit)) - u.RawQuery = q.Encode() + for _, status := range statuses { + page := 1 + limit := 50 // Default page size - req, err := http.NewRequestWithContext(ctx, "GET", u.String(), nil) - if err != nil { - return 0, err - } + for { + u, err := url.Parse(endpoint) + if err != nil { + return nil, err + } + q := u.Query() + q.Set("status", status) + q.Set("page", fmt.Sprintf("%d", page)) + q.Set("limit", fmt.Sprintf("%d", limit)) + u.RawQuery = q.Encode() - req.Header.Set("Authorization", "token "+authToken) - req.Header.Set("Accept", "application/json") + fmt.Printf("DEBUG: Fetching jobs from %s\n", u.String()) - resp, err := c.httpClient.Do(req) - if err != nil { - return 0, err - } + req, err := http.NewRequestWithContext(ctx, "GET", u.String(), nil) + if err != nil { + return nil, err + } + + req.Header.Set("Authorization", "token "+authToken) + req.Header.Set("Accept", "application/json") + + resp, err := c.httpClient.Do(req) + if err != nil { + fmt.Printf("DEBUG: Request failed: %v\n", err) + return nil, err + } + + fmt.Printf("DEBUG: Response status: %s\n", resp.Status) + + if resp.StatusCode != http.StatusOK { + body, _ := io.ReadAll(resp.Body) + _ = resp.Body.Close() + fmt.Printf("DEBUG: Error body: %s\n", string(body)) + return nil, c.handleHTTPError(resp.StatusCode, body, "fetch workflow jobs") + } - if resp.StatusCode != http.StatusOK { body, _ := io.ReadAll(resp.Body) - resp.Body.Close() - return 0, c.handleHTTPError(resp.StatusCode, body, "fetch workflow jobs") + _ = resp.Body.Close() + fmt.Printf("DEBUG: Response body: %s\n", string(body)) + + var result ActionWorkflowJobsResponse + if err := json.Unmarshal(body, &result); err != nil { + fmt.Printf("DEBUG: Failed to decode response: %v\n", err) + return nil, err + } + + fmt.Printf("DEBUG: Found %d jobs, total in Gitea: %d\n", len(result.Jobs), result.TotalCount) + + // Filter and collect matching jobs for this page + matchedJobs := c.filterQueuedJobs(result.Jobs, labels) + fmt.Printf("DEBUG: %d jobs matched labels %v\n", len(matchedJobs), labels) + allJobs = append(allJobs, matchedJobs...) + + // Break if we've fetched all available results + if len(result.Jobs) < limit { + break + } + + page++ } - - var result ActionWorkflowJobsResponse - if err := json.NewDecoder(resp.Body).Decode(&result); err != nil { - resp.Body.Close() - return 0, err - } - resp.Body.Close() - - // Filter and count matching jobs for this page - pageCount := c.filterQueuedJobs(result.Jobs, labels) - totalCount += pageCount - - // Break if we've fetched all available results - if len(result.Jobs) < limit { - break - } - - page++ } - return totalCount, nil + return allJobs, nil } -// fetchWorkflowRuns fetches workflow runs from a given endpoint (deprecated - use jobs for label filtering) -func (c *HTTPClient) fetchWorkflowRuns(ctx context.Context, endpoint, authToken string) ([]ActionWorkflowRun, error) { - // Add status=queued query parameter - u, err := url.Parse(endpoint) - if err != nil { - return nil, err - } - q := u.Query() - q.Set("status", "queued") - u.RawQuery = q.Encode() - - req, err := http.NewRequestWithContext(ctx, "GET", u.String(), nil) - if err != nil { - return nil, err - } - - req.Header.Set("Authorization", "token "+authToken) - req.Header.Set("Accept", "application/json") - - resp, err := c.httpClient.Do(req) - if err != nil { - return nil, err - } - defer resp.Body.Close() - - if resp.StatusCode != http.StatusOK { - body, _ := io.ReadAll(resp.Body) - return nil, c.handleHTTPError(resp.StatusCode, body, "fetch workflow runs") - } - - var result ActionWorkflowRunsResponse - if err := json.NewDecoder(resp.Body).Decode(&result); err != nil { - return nil, err - } - - return result.WorkflowRuns, nil -} - -// fetchOrgRepos fetches all repositories under an organization with pagination -func (c *HTTPClient) fetchOrgRepos(ctx context.Context, giteaURL, authToken, org string) ([]Repository, error) { +// fetchReposForUser fetches all repositories owned by a specific user with pagination +func (c *HTTPClient) fetchReposForUser(ctx context.Context, giteaURL, authToken, username string) ([]Repository, error) { var allRepos []Repository page := 1 limit := 50 for { - endpoint := fmt.Sprintf("%s/api/v1/orgs/%s/repos", strings.TrimSuffix(giteaURL, "/"), org) + endpoint := fmt.Sprintf("%s/api/v1/users/%s/repos", strings.TrimSuffix(giteaURL, "/"), username) u, err := url.Parse(endpoint) if err != nil { return nil, err @@ -263,6 +278,8 @@ func (c *HTTPClient) fetchOrgRepos(ctx context.Context, giteaURL, authToken, org q.Set("limit", fmt.Sprintf("%d", limit)) u.RawQuery = q.Encode() + fmt.Printf("DEBUG: Fetching repos for user %s from %s\n", username, u.String()) + req, err := http.NewRequestWithContext(ctx, "GET", u.String(), nil) if err != nil { return nil, err @@ -273,131 +290,28 @@ func (c *HTTPClient) fetchOrgRepos(ctx context.Context, giteaURL, authToken, org resp, err := c.httpClient.Do(req) if err != nil { + fmt.Printf("DEBUG: Request failed: %v\n", err) return nil, err } + fmt.Printf("DEBUG: Response status: %s\n", resp.Status) + if resp.StatusCode != http.StatusOK { body, _ := io.ReadAll(resp.Body) - resp.Body.Close() + _ = resp.Body.Close() + fmt.Printf("DEBUG: Error body: %s\n", string(body)) return nil, c.handleHTTPError(resp.StatusCode, body, "fetch user repos") } - var repos []Repository - if err := json.NewDecoder(resp.Body).Decode(&repos); err != nil { - resp.Body.Close() - return nil, err - } - resp.Body.Close() - - allRepos = append(allRepos, repos...) - - if len(repos) < limit { - break - } - - page++ - } - - return allRepos, nil -} - -// fetchAllOrgs fetches all organizations visible to the authenticated user with pagination -func (c *HTTPClient) fetchAllOrgs(ctx context.Context, giteaURL, authToken string) ([]Organization, error) { - var allOrgs []Organization - page := 1 - limit := 50 - - for { - endpoint := fmt.Sprintf("%s/api/v1/user/orgs", strings.TrimSuffix(giteaURL, "/")) - u, err := url.Parse(endpoint) - if err != nil { - return nil, err - } - q := u.Query() - q.Set("page", fmt.Sprintf("%d", page)) - q.Set("limit", fmt.Sprintf("%d", limit)) - u.RawQuery = q.Encode() - - req, err := http.NewRequestWithContext(ctx, "GET", u.String(), nil) - if err != nil { - return nil, err - } - - req.Header.Set("Authorization", "token "+authToken) - req.Header.Set("Accept", "application/json") - - resp, err := c.httpClient.Do(req) - if err != nil { - return nil, err - } - - if resp.StatusCode != http.StatusOK { - body, _ := io.ReadAll(resp.Body) - resp.Body.Close() - return nil, c.handleHTTPError(resp.StatusCode, body, "fetch org repos") - } - - var orgs []Organization - if err := json.NewDecoder(resp.Body).Decode(&orgs); err != nil { - resp.Body.Close() - return nil, err - } - resp.Body.Close() - - allOrgs = append(allOrgs, orgs...) - - if len(orgs) < limit { - break - } - - page++ - } - - return allOrgs, nil -} - -// fetchUserRepos fetches all repositories owned by the authenticated user with pagination -func (c *HTTPClient) fetchUserRepos(ctx context.Context, giteaURL, authToken string) ([]Repository, error) { - var allRepos []Repository - page := 1 - limit := 50 - - for { - endpoint := fmt.Sprintf("%s/api/v1/user/repos", strings.TrimSuffix(giteaURL, "/")) - u, err := url.Parse(endpoint) - if err != nil { - return nil, err - } - q := u.Query() - q.Set("page", fmt.Sprintf("%d", page)) - q.Set("limit", fmt.Sprintf("%d", limit)) - u.RawQuery = q.Encode() - - req, err := http.NewRequestWithContext(ctx, "GET", u.String(), nil) - if err != nil { - return nil, err - } - - req.Header.Set("Authorization", "token "+authToken) - req.Header.Set("Accept", "application/json") - - resp, err := c.httpClient.Do(req) - if err != nil { - return nil, err - } - - if resp.StatusCode != http.StatusOK { - body, _ := io.ReadAll(resp.Body) - resp.Body.Close() - return nil, c.handleHTTPError(resp.StatusCode, body, "fetch user orgs") - } + body, _ := io.ReadAll(resp.Body) + _ = resp.Body.Close() + // fmt.Printf("DEBUG: Response body: %s\n", string(body)) var repos []Repository - if err := json.NewDecoder(resp.Body).Decode(&repos); err != nil { - resp.Body.Close() + if err := json.Unmarshal(body, &repos); err != nil { + fmt.Printf("DEBUG: Failed to decode response: %v\n", err) return nil, err } - resp.Body.Close() allRepos = append(allRepos, repos...) @@ -412,44 +326,42 @@ func (c *HTTPClient) fetchUserRepos(ctx context.Context, giteaURL, authToken str } // filterQueuedJobs filters workflow jobs by labels -func (c *HTTPClient) filterQueuedJobs(jobs []ActionWorkflowJob, requiredLabels []string) int { - if len(requiredLabels) == 0 { - // No label filtering required, return all queued jobs - return len(jobs) - } - - count := 0 +func (c *HTTPClient) filterQueuedJobs(jobs []ActionWorkflowJob, runnerLabels []string) []ActionWorkflowJob { + var matched []ActionWorkflowJob for _, job := range jobs { - if c.jobMatchesLabels(job.Labels, requiredLabels) { - count++ + match := c.jobMatchesLabels(job.Labels, runnerLabels) + fmt.Printf("DEBUG: Job %d (Status: %s, Labels: %v) matches runner capabilities %v? %v\n", job.ID, job.Status, job.Labels, runnerLabels, match) + if match { + matched = append(matched, job) } } - return count + return matched } -// jobMatchesLabels checks if a job's labels match the required labels -func (c *HTTPClient) jobMatchesLabels(jobLabels, requiredLabels []string) bool { - // Convert job labels to map for faster lookup - labelSet := make(map[string]bool) - for _, label := range jobLabels { - labelSet[label] = true +// jobMatchesLabels checks if a job's requirements are satisfied by the runner's supported labels +func (c *HTTPClient) jobMatchesLabels(jobLabels, supportedLabels []string) bool { + if len(jobLabels) == 0 { + return true } - // Check if all required labels are present - for _, required := range requiredLabels { - if !labelSet[required] { + // For each label required by the job, check if the runner supports it + for _, req := range jobLabels { + found := false + for _, supp := range supportedLabels { + // Check for exact match or schema match (label:schema) + // e.g. Job asks for "ubuntu-latest", Runner has "ubuntu-latest:docker://..." + if req == supp || strings.HasPrefix(supp, req+":") { + found = true + break + } + } + if !found { return false } } return true } -// filterQueuedRuns filters workflow runs by labels (deprecated - use filterQueuedJobs) -func (c *HTTPClient) filterQueuedRuns(runs []ActionWorkflowRun, labels []string) int { - // Legacy method - jobs should be used for label filtering - return len(runs) -} - // handleHTTPError provides specific error handling for different HTTP status codes func (c *HTTPClient) handleHTTPError(statusCode int, body []byte, operation string) error { switch statusCode { diff --git a/internal/gitea/client_test.go b/internal/gitea/client_test.go index f3d42e8..08129f0 100644 --- a/internal/gitea/client_test.go +++ b/internal/gitea/client_test.go @@ -27,16 +27,17 @@ import ( "github.com/bapung/gitea-runner-operator/api/v1alpha1" ) -func TestHTTPClient_GetQueuedRuns(t *testing.T) { +func TestHTTPClient_GetRunnerStats(t *testing.T) { tests := []struct { - name string - scope v1alpha1.RunnerGroupScope - org string - repo string - labels []string - mockResponse ActionWorkflowJobsResponse - expectedCount int - expectedError bool + name string + scope v1alpha1.RunnerGroupScope + org string + user string + repo string + labels []string + mockResponse ActionWorkflowJobsResponse + expectedQueued int + expectedError bool }{ { name: "repo scope with matching labels", @@ -51,38 +52,55 @@ func TestHTTPClient_GetQueuedRuns(t *testing.T) { {ID: 2, Status: "queued", Labels: []string{"linux", "arm64"}}, }, }, - expectedCount: 1, - expectedError: false, + expectedQueued: 1, // Job 1 matches + expectedError: false, }, { - name: "org scope no label filtering", + name: "org scope no label filtering (matches all)", scope: v1alpha1.RunnerGroupScopeOrg, org: "testorg", - labels: []string{}, + labels: []string{}, // No specific capabilities, matches jobs with empty requirements? No, empty labels matches nothing? + // Wait, previous logic was: if reqLabels is empty, return all. + // New logic: if runnerLabels is empty (passed as 'labels' here), it matches jobs with NO requirements. + // But for test purposes, let's assume we pass runner capabilities. + // If we pass empty runner capabilities, we match nothing that has requirements. + // Let's pass capabilities that cover the jobs. mockResponse: ActionWorkflowJobsResponse{ TotalCount: 3, Jobs: []ActionWorkflowJob{ - {ID: 1, Status: "queued", Labels: []string{"linux", "x64"}}, - {ID: 2, Status: "queued", Labels: []string{"windows"}}, - {ID: 3, Status: "queued", Labels: []string{"macos"}}, + {ID: 1, Status: "queued", Labels: []string{"linux"}}, }, }, - expectedCount: 3, - expectedError: false, + expectedQueued: 0, // No runner capabilities provided -> no match + expectedError: false, }, { name: "global scope with specific labels", scope: v1alpha1.RunnerGroupScopeGlobal, - labels: []string{"docker"}, + labels: []string{"docker", "linux"}, mockResponse: ActionWorkflowJobsResponse{ TotalCount: 2, Jobs: []ActionWorkflowJob{ - {ID: 1, Status: "queued", Labels: []string{"docker", "linux"}}, - {ID: 2, Status: "queued", Labels: []string{"linux"}}, + {ID: 1, Status: "queued", Labels: []string{"docker", "linux"}}, // Match + {ID: 2, Status: "queued", Labels: []string{"linux"}}, // Match (subset) }, }, - expectedCount: 1, - expectedError: false, + expectedQueued: 2, + expectedError: false, + }, + { + name: "user scope", + scope: v1alpha1.RunnerGroupScopeUser, + user: "testuser", + labels: []string{"linux"}, + mockResponse: ActionWorkflowJobsResponse{ + TotalCount: 1, + Jobs: []ActionWorkflowJob{ + {ID: 1, Status: "queued", Labels: []string{"linux"}}, + }, + }, + expectedQueued: 1, + expectedError: false, }, } @@ -90,6 +108,23 @@ func TestHTTPClient_GetQueuedRuns(t *testing.T) { t.Run(tt.name, func(t *testing.T) { // Create mock server server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { + w.Header().Set("Content-Type", "application/json") + + // Handle User Repos call for User Scope + if tt.scope == v1alpha1.RunnerGroupScopeUser && strings.Contains(r.URL.Path, "/repos") && !strings.Contains(r.URL.Path, "/actions/jobs") { + repos := []Repository{ + { + Name: "testrepo", + Owner: struct { + Login string `json:"login"` + }{Login: tt.user}, + FullName: tt.user + "/testrepo", + }, + } + _ = json.NewEncoder(w).Encode(repos) + return + } + // Verify correct endpoint is called expectedPath := "" switch tt.scope { @@ -99,35 +134,37 @@ func TestHTTPClient_GetQueuedRuns(t *testing.T) { expectedPath = "/api/v1/orgs/testorg/actions/jobs" case v1alpha1.RunnerGroupScopeGlobal: expectedPath = "/api/v1/admin/actions/jobs" + case v1alpha1.RunnerGroupScopeUser: + expectedPath = "/api/v1/repos/" + tt.user + "/testrepo/actions/jobs" } if !strings.HasPrefix(r.URL.Path, expectedPath) { t.Errorf("Expected path to start with %s, got %s", expectedPath, r.URL.Path) } - // Verify query parameters - if r.URL.Query().Get("status") != "queued" { - t.Errorf("Expected status=queued, got %s", r.URL.Query().Get("status")) - } - // Verify authorization header authHeader := r.Header.Get("Authorization") if !strings.HasPrefix(authHeader, "token ") { t.Errorf("Expected Authorization header to start with 'token ', got %s", authHeader) } - w.Header().Set("Content-Type", "application/json") - json.NewEncoder(w).Encode(tt.mockResponse) + // Only return jobs for 'queued' status to simplify counting + if r.URL.Query().Get("status") == "queued" { + _ = json.NewEncoder(w).Encode(tt.mockResponse) + } else { + _ = json.NewEncoder(w).Encode(ActionWorkflowJobsResponse{TotalCount: 0, Jobs: []ActionWorkflowJob{}}) + } })) defer server.Close() client := NewHTTPClient() - count, err := client.GetQueuedRuns( + stats, err := client.GetRunnerStats( context.Background(), server.URL, "test-token", tt.scope, tt.org, + tt.user, tt.repo, tt.labels, ) @@ -138,8 +175,10 @@ func TestHTTPClient_GetQueuedRuns(t *testing.T) { if !tt.expectedError && err != nil { t.Errorf("Expected no error but got: %v", err) } - if count != tt.expectedCount { - t.Errorf("Expected count %d, got %d", tt.expectedCount, count) + if stats != nil { + if len(stats.QueuedJobs) != tt.expectedQueued { + t.Errorf("Expected %d queued jobs, got %d", tt.expectedQueued, len(stats.QueuedJobs)) + } } }) } @@ -149,46 +188,46 @@ func TestJobMatchesLabels(t *testing.T) { client := &HTTPClient{} tests := []struct { - name string - jobLabels []string - requiredLabels []string - expected bool + name string + jobLabels []string + supportedLabels []string + expected bool }{ { - name: "exact match", - jobLabels: []string{"linux", "x64"}, - requiredLabels: []string{"linux", "x64"}, - expected: true, + name: "exact match", + jobLabels: []string{"linux", "x64"}, + supportedLabels: []string{"linux", "x64"}, + expected: true, }, { - name: "subset match", - jobLabels: []string{"linux", "x64", "docker"}, - requiredLabels: []string{"linux", "x64"}, - expected: true, + name: "subset match (runner has more)", + jobLabels: []string{"linux"}, + supportedLabels: []string{"linux", "x64"}, + expected: true, }, { - name: "no match", - jobLabels: []string{"linux", "arm64"}, - requiredLabels: []string{"linux", "x64"}, - expected: false, + name: "schema match", + jobLabels: []string{"ubuntu-latest"}, + supportedLabels: []string{"ubuntu-latest:docker://node:16"}, + expected: true, }, { - name: "empty required labels", - jobLabels: []string{"linux", "x64"}, - requiredLabels: []string{}, - expected: true, + name: "no match (missing req)", + jobLabels: []string{"linux", "arm64"}, + supportedLabels: []string{"linux", "x64"}, + expected: false, }, { - name: "partial match", - jobLabels: []string{"linux"}, - requiredLabels: []string{"linux", "x64"}, - expected: false, + name: "empty required labels (matches anything)", + jobLabels: []string{}, + supportedLabels: []string{"linux"}, + expected: true, }, } for _, tt := range tests { t.Run(tt.name, func(t *testing.T) { - result := client.jobMatchesLabels(tt.jobLabels, tt.requiredLabels) + result := client.jobMatchesLabels(tt.jobLabels, tt.supportedLabels) if result != tt.expected { t.Errorf("Expected %v, got %v", tt.expected, result) } @@ -207,42 +246,32 @@ func TestFilterQueuedJobs(t *testing.T) { } tests := []struct { - name string - requiredLabels []string - expectedCount int + name string + supportedLabels []string + expectedIDs []int64 }{ { - name: "filter by linux", - requiredLabels: []string{"linux"}, - expectedCount: 3, + name: "runner supports linux, x64", + supportedLabels: []string{"linux", "x64"}, + expectedIDs: []int64{1}, }, { - name: "filter by linux and x64", - requiredLabels: []string{"linux", "x64"}, - expectedCount: 2, + name: "runner supports linux, x64, docker", + supportedLabels: []string{"linux", "x64", "docker"}, + expectedIDs: []int64{1, 4}, }, { - name: "filter by docker", - requiredLabels: []string{"docker"}, - expectedCount: 1, - }, - { - name: "no labels - return all", - requiredLabels: []string{}, - expectedCount: 4, - }, - { - name: "no matches", - requiredLabels: []string{"macos"}, - expectedCount: 0, + name: "runner supports everything", + supportedLabels: []string{"linux", "x64", "arm64", "windows", "docker"}, + expectedIDs: []int64{1, 2, 3, 4}, }, } for _, tt := range tests { t.Run(tt.name, func(t *testing.T) { - count := client.filterQueuedJobs(jobs, tt.requiredLabels) - if count != tt.expectedCount { - t.Errorf("Expected %d, got %d", tt.expectedCount, count) + matched := client.filterQueuedJobs(jobs, tt.supportedLabels) + if len(matched) != len(tt.expectedIDs) { + t.Errorf("Expected %d matched jobs, got %d", len(tt.expectedIDs), len(matched)) } }) } diff --git a/specification.md b/specification.md index dd42805..668f2b7 100644 --- a/specification.md +++ b/specification.md @@ -10,6 +10,8 @@ The Gitea Runner Operator is a Kubernetes controller designed to manage ephemera - **RunnerGroup CR**: The custom resource instance defining a runner pool. - **Ephemeral Runner**: A runner that executes exactly one job and then terminates. - **Gitea Instance**: The target Gitea server where CI/CD workflows are triggered. +- **Runner Capabilities**: The set of labels a runner provides (e.g., `ubuntu-latest`). +- **Job Requirements**: The set of labels a job requests (e.g., `ubuntu-latest`). ## 3. Custom Resource Definition (CRD) @@ -24,16 +26,17 @@ The Gitea Runner Operator is a Kubernetes controller designed to manage ephemera The `spec` defines the configuration for the runner pool. -| Field | Type | Required | Description | -| :------------------ | :----------------------------- | :---------- | :---------------------------------------------------------------------------------------------------------- | -| `scope` | Enum (`global`, `org`, `repo`) | Yes | The scope of the runner. | -| `org` | String | Conditional | The organization name. Required if `scope` is `org`. | -| `repo` | String | Conditional | The repository name. Required if `scope` is `repo`. | -| `gitea.url` | String | Yes | The base URL of the Gitea instance (e.g., `https://gitea.example.com`). | -| `labels` | []String | No | List of labels for the runner (e.g., `ubuntu-latest`, `app:infra`). Used by Gitea to match jobs to runners. | -| `maxActiveRunners` | Integer | Yes | The maximum number of concurrent runner Jobs allowed for this specific RunnerGroup CR. | -| `registrationToken` | SecretKeySelector | Yes | Reference to a Secret containing the runner registration token. | -| `authToken` | SecretKeySelector | Yes | Reference to a Secret containing an API token to query Gitea for job statuses. | +| Field | Type | Required | Description | +| :------------------ | :------------------------------------- | :---------- | :---------------------------------------------------------------------------------------------------------- | +| `scope` | Enum (`global`, `org`, `user`, `repo`) | Yes | The scope of the runner. | +| `org` | String | Conditional | The organization name. Required if `scope` is `org`. | +| `user` | String | Conditional | The username. Required if `scope` is `user`. | +| `repo` | String | Conditional | The repository name. Required if `scope` is `repo`. | +| `gitea.url` | String | Yes | The base URL of the Gitea instance (e.g., `https://gitea.example.com`). | +| `labels` | []String | No | List of labels for the runner (e.g., `app:infra`). Defaults (e.g. `ubuntu-latest`) are added automatically. | +| `maxActiveRunners` | Integer | Yes | The maximum number of concurrent runner Jobs allowed for this specific RunnerGroup CR. | +| `registrationToken` | SecretKeySelector | Yes | Reference to a Secret containing the runner registration token. | +| `authToken` | SecretKeySelector | Yes | Reference to a Secret containing an API token to query Gitea for job statuses. | #### 3.2.1 SecretKeySelector @@ -42,7 +45,7 @@ Standard Kubernetes Secret reference: - `secretRef.name`: Name of the secret. - `secretRef.key`: Key within the secret containing the value. -### 3.3 Status Schema (Optional but Recommended) +### 3.3 Status Schema - `activeRunners`: Integer. Current count of running Jobs managed by this CR. - `lastCheckTime`: Timestamp. Last time the controller polled Gitea. @@ -54,37 +57,44 @@ Standard Kubernetes Secret reference: The controller watches for changes to `RunnerGroup` resources. 1. **Validation**: Ensure `org` or `repo` are present based on `scope`. -2. **Job Cleanup**: (Optional) Check for and remove "stuck" jobs if TTL doesn't cover edge cases, though `ttlSecondsAfterFinished` is primary. -3. **Metric Collection**: Update status with current running job count. -4. **Polling**: The controller must implement a polling mechanism (loop) independent of the standard Reconcile trigger, or requeue the Reconcile event periodically (e.g., every 10-30 seconds). +2. **Job List**: List child Jobs to determine `activeRunners` count. +3. **Status Update**: Update CR status with current metrics. +4. **Capacity Check**: If `activeRunners >= maxActiveRunners`, stop scaling up. +5. **Polling**: Fetch job statistics from Gitea. -### 4.2 Polling & Scaling Logic +### 4.2 Polling & Scaling Strategy -On every poll interval for a specific `RunnerGroup` CR: +The operator uses a robust polling strategy to handle the disconnect between Kubernetes Pod startup time and Gitea's job queue state. -1. **Check Capacity**: - - Query Kubernetes for active `Jobs` owned by this `RunnerGroup` CR. - - If `count(active_jobs) >= maxActiveRunners`, stop. Do not spawn new runners. +#### 4.2.1 Fetching Stats (`GetRunnerStats`) -2. **Fetch Queued Jobs**: - - Call Gitea API using `authToken`. - - Endpoint depends on scope: - - **Global**: Recursively fetch all workflow runs: - 1. Fetch all organizations in the Gitea instance - 2. For each organization, fetch all repositories under that org - 3. For each repository, query `/repos/{owner}/{repo}/actions/runs?status=queued` - 4. Additionally, fetch all user-owned repositories and query their workflow runs - - **Org**: Fetch all workflow runs in repos under the organization: - 1. Fetch all repositories under the specified organization - 2. For each repository, query `/repos/{owner}/{repo}/actions/runs?status=queued` - - **Repo**: Directly query `/repos/{owner}/{repo}/actions/runs?status=queued` - - Filter the returned runs: - - Must match the `labels` defined in the `RunnerGroup` CR. +The controller queries Gitea for: -3. **Spawn Runner**: - - If a queued job is found and capacity allows, create a Kubernetes `Job`. - - **One Job per Queued Workflow**: Ideally, the logic should map 1 queued run -> 1 Runner Job. - - **Concurrency Control**: Ensure we don't spawn more jobs than `maxActiveRunners - currentActiveRunners`. +1. **Queued Jobs**: Jobs with status `queued`, `waiting`, or `pending`. + - **Label Filtering**: Jobs are filtered client-side. A job is considered a match if the RunnerGroup's capabilities (Spec labels + Default labels) are a superset of the Job's required labels. +2. **Running Jobs**: Jobs with status `running` that belong to this specific runner group (filtered by runner name prefix). + +#### 4.2.2 Deduplication Cache (`SpawnedJobsCache`) + +To prevent "double scheduling" (where multiple reconciliation loops spawn multiple runners for the same queued job before the first runner can pick it up), the controller maintains an in-memory cache: + +- **Key**: Gitea Job ID. +- **Value**: Timestamp when the runner was spawned. +- **TTL**: 5 minutes. + +#### 4.2.3 Scaling Algorithm + +1. **Identify Candidates**: Iterate through the list of Queued Jobs from Gitea. +2. **Check Cache**: + - If Job ID is in cache and TTL has not expired: **Skip** (Runner already spawned). + - If Job ID is in cache and TTL expired: **Retry** (Runner likely failed to start). + - If Job ID is not in cache: **Candidate for spawning**. +3. **Calculate Slots**: `availableSlots = maxActiveRunners - activeRunners`. +4. **Spawn**: For each candidate, if `availableSlots > 0`: + - Create Kubernetes Job. + - Add Job ID to `SpawnedJobsCache`. + - Decrement `availableSlots`. +5. **Cleanup**: Remove Job IDs from the cache if they are no longer present in the Queued Jobs list returned by Gitea (implies they are now Running, Completed, or Cancelled). ## 5. Kubernetes Resource Generation @@ -94,40 +104,44 @@ The controller creates a `batch/v1 Job`. **Metadata:** -- `name`: `{runnergroup-cr-name}-{random-suffix}` +- `name`: `{runnergroup-name}-{random-suffix}` - `namespace`: Same as `RunnerGroup` CR. - `labels`: - - `app`: `{runnergroup-cr-name}` + - `gitea.bpg.pw/runnergroup-name`: `{runnergroup-name}` - `gitea.bpg.pw/managed-by`: `gitea-runner-operator` - - `gitea.bpg.pw/runnergroup-name`: `{runnergroup-cr-name}` - `ownerReferences`: Pointing to the `RunnerGroup` CR. **Spec:** -- `ttlSecondsAfterFinished`: 600 (Clean up finished jobs). +- `ttlSecondsAfterFinished`: 600 (Auto-cleanup). - `template`: - `spec`: - `restartPolicy`: `OnFailure` - `containers`: - **Name**: `runner` - - **Image**: `gitea/act_runner:nightly-dind-rootless` (Default, potentially configurable in CR later). - - **SecurityContext**: `privileged: true` (Required for DIND). + - **Image**: `gitea/act_runner:nightly-dind-rootless` - **Env**: - `GITEA_INSTANCE_URL`: From `spec.gitea.url`. - - `GITEA_RUNNER_REGISTRATION_TOKEN`: From `spec.registrationToken`. + - `GITEA_RUNNER_REGISTRATION_TOKEN`: From Secret. - `GITEA_RUNNER_EPHEMERAL`: `"true"`. - - `GITEA_RUNNER_LABELS`: Comma-separated list from `spec.labels`. - - `DOCKER_HOST`: `tcp://localhost:2376` - - **VolumeMounts**: - - Mount docker socket or storage if necessary. The README example uses a PVC `act-runner-vol` mounted to `/data`. _Note: Using a shared PVC for ephemeral runners might cause race conditions. EmptyDir is preferred for truly ephemeral runners unless caching is strictly required and managed._ + - `GITEA_RUNNER_NAME`: `{job-name}` (Matches Pod name for easier debugging). + - `GITEA_RUNNER_LABELS`: Comma-separated list of **Effective Labels**. + - **Effective Labels** = `spec.labels` + Default Gitea Labels (e.g., `ubuntu-latest:docker://node:16-bullseye`, `ubuntu-22.04:...`, etc.) unless explicitly overridden. ## 6. Gitea API Interaction - **Authentication**: Bearer token provided in `authToken`. -- **Client**: HTTP Client with timeout. +- **Endpoints Used**: + - `/api/v1/repos/{owner}/{repo}/actions/jobs` (Repo scope) + - `/api/v1/orgs/{org}/actions/jobs` (Org scope) + - `/api/v1/users/{user}/repos` + `/api/v1/repos/{owner}/{repo}/actions/jobs` (User scope) + - `/api/v1/admin/actions/jobs` (Global scope) +- **Label Matching**: + - The controller implements logic to check: `Job.Labels ⊆ Runner.EffectiveLabels`. + - Supports both exact matches (`linux`) and schema matches (`ubuntu-latest` matches `ubuntu-latest:docker://...`). ## 7. Security Considerations -- **Token Handling**: Registration and Auth tokens are read from Kubernetes Secrets and injected as Environment Variables. They are not stored in plain text in the CR. -- **Privileged Mode**: The default `act_runner` image (dind) requires privileged mode. The Operator creates Jobs with this permission. -- **Namespace Isolation**: The Operator should respect RBAC and only operate within allowed namespaces. +- **Token Handling**: Tokens are injected via `valueFrom: secretKeyRef` env vars. +- **Privileged Mode**: `act_runner` dind mode requires privileged security context. +- **Namespace Isolation**: Controller operates within the namespace of the RunnerGroup.