# JEP-0014: Virtual Scalable Exporters | Field | Value | | ----------------- | -------------------------------------------------------------- | | **JEP** | 0014 | | **Title** | Virtual Scalable Exporters | | **Author(s)** | @mangelajo (Miguel Angel Ajo Pelayo) | | **Status** | Draft | | **Type** | Standards Track | | **Created** | 2026-06-03 | | **Updated** | 2026-06-18 | | **Discussion** | https://github.com/jumpstarter-dev/jumpstarter/issues/41 | | **Requires** | | | **Supersedes** | | | **Superseded-By** | | --- ## Abstract This JEP proposes a Virtual Scalable Exporter subsystem for Jumpstarter that manages pools of virtual targets with configurable autoscaling. Conceptually, the system scales **virtual targets**; the **Exporter** is the scheduling and leasing unit (the Pod analog). Each `ExporterSet` declares scaling bounds using familiar Kubernetes vocabulary (`minReplicas`, `maxReplicas`, `minAvailableReplicas`); the controller maintains a warm pool of ready exporters to absorb the 10-60s cold-start latency of VM boot and exporter registration. This enables low-latency lease acquisition, massive scalability, resource efficiency, and simplified orchestration of mixed physical/virtual test topologies — while allowing administrators to tune the trade-off between responsiveness and resource consumption on a per-target basis. ## Motivation Jumpstarter currently excels at managing scarce, physical hardware targets. However, testing and development often require a mix of physical devices and scalable, virtual resources. Today, virtual targets must be manually deployed as static exporters with a fixed count — there is no mechanism for the system to maintain or scale a pool of virtual instances based on demand. This model has several limitations: - **Artificial scarcity:** Virtual targets are treated as a fixed-size pool, just like physical ones, which defeats their "virtually unlimited" potential. - **No elasticity:** The pool cannot grow when demand spikes (CI burst) or shrink when idle, leading to either queuing or waste. - **Manual lifecycle:** Administrators must manually deploy, monitor, and scale virtual exporter instances — there is no declarative "desired state" for a virtual target pool. - **Cold-start penalty vs. waste trade-off:** Users must choose between pre-spawning many instances (wasting resources when idle) or spawning on demand (high latency at lease time). There is no middle ground. The core problem is that virtual targets lack a pool manager that can maintain a configurable warm pool while autoscaling to meet demand. ### Fidelity / Cost Ladder One logical target can be served by multiple backends at different fidelity and cost tiers. Users select via labels through `jmp lease`; the same workflow applies regardless of backend: | class (provisioner) | fidelity | scale/cost | role | | --- | --- | --- | --- | | container sim (`qemu.jumpstarter.dev`) | low | cheap / CI-scale | functional checks | | cloud virtual device (`corellium.jumpstarter.dev`) | high | metered | higher-fidelity behavior | | real hardware (Exporter) | full | scarce | ground truth | For example, a target that needs GPU or specialized I/O can run functional checks cheaply on a QEMU class in CI, validate higher-fidelity behavior on a cloud-backed virtual device, and use real hardware as ground truth. The The `VirtualTargetClass` abstraction makes this ladder explicit without changing the lease experience. ### User Stories - **As a** CI pipeline author, **I want to** lease N virtual targets instantly from a warm pool, **so that** my pipeline doesn't block on provisioning latency during burst periods. - **As a** developer, **I want to** lease a virtual target matching a known physical board's properties with near-zero wait time, **so that** I can iterate quickly without waiting for scarce hardware. - **As a** platform engineer, **I want to** declare an `ExporterSet` with `minAvailableReplicas: 2, maxReplicas: 20`, **so that** there are always warm instances ready while the system scales up on demand and scales down when idle. - **As a** cost-conscious operator, **I want to** set `minAvailableReplicas: 0` for rarely-used target types, **so that** they consume no resources until actually requested, accepting a cold-start delay. ## Proposal The proposal introduces **Virtual Scalable Exporters** — a controller-managed pool of virtual target instances with configurable autoscaling. Rather than treating virtual targets as purely on-demand or purely static, each `ExporterSet` declares scaling parameters that let administrators tune the trade-off between instant availability and resource consumption. ### Resource Hierarchy Virtual scalable exporters are modeled on familiar Kubernetes workload primitives: ```text VirtualTargetClass ←── referenced by ── ExporterSet │ ▼ Exporter ──► Pod (exporter sidecar + target runtime) ``` - **`VirtualTargetClass`** — **namespaced** configuration for a backend (`provisioner`, nested `parameters`, credentials, scheduling, binding mode). Lives in the same namespace as referencing `ExporterSet` resources. Admins own classes; `ExporterSet` authors never touch credentials. - **`ExporterSet`** — namespaced generic scaling resource with `selector` + inline `template`. References a `VirtualTargetClass` by name in the **same namespace**. Optional nested `parameters` deep-merge over the class defaults. One mental model for all backends. - **`Exporter`** — the minimum leased unit. Exposes drivers that connect to the virtual target provisioned from the class. ### Core Concept: ExporterSet with Kubernetes-Native Scaling `ExporterSet` is a generic CRD (ReplicaSet + HPA analog) with familiar scaling vocabulary. Provider typing lives in `VirtualTargetClass`, not in the pool CRD itself. **Example: VirtualTargetClass (namespaced backend profile)** ```yaml apiVersion: jumpstarter.dev/v1alpha1 kind: VirtualTargetClass metadata: name: qemu-rpi4 namespace: jumpstarter spec: provisioner: qemu.jumpstarter.dev bindingMode: Immediate # warm pool; WaitForFirstConsumer = on-demand reclaimPolicy: Delete scheduling: # inherited by rendered exporter Pods nodeSelector: kubernetes.io/arch: arm64 tolerations: - key: jumpstarter.dev/kvm operator: Exists effect: NoSchedule resources: limits: devices.kubevirt.io/kvm: "1" parameters: # nested object; provisioner interprets machineType: virt firmware: url: registry.example.com/firmware/rpi4:latest digest: sha256:abc... resources: cpu: 4 memory: 4Gi storage: 16Gi ``` **Example: ExporterSet (generic scaling resource)** ```yaml apiVersion: jumpstarter.dev/v1alpha1 kind: ExporterSet metadata: name: rpi4-virtual namespace: jumpstarter spec: minReplicas: 0 maxReplicas: 20 minAvailableReplicas: 2 # PDB-style warm buffer (ready & unleased) scaleDownCooldown: 5m recycleStrategy: ExitAndReplace # or InPlaceReuse virtualTargetClassName: qemu-rpi4 # same-namespace VirtualTargetClass name parameters: # optional; deep-merged over class parameters resources: memory: 8Gi # override only memory; cpu/storage inherited selector: matchLabels: board: rpi4 template: # embedded template (Deployment idiom) metadata: labels: board: rpi4 arch: aarch64 virtual: "true" spec: drivers: - type: jumpstarter_driver_power.driver.QemuPower - type: jumpstarter_driver_network.driver.TcpNetwork config: port: 22 - type: jumpstarter_driver_serial.driver.QemuSerial status: replicas: 5 readyReplicas: 3 availableReplicas: 1 # warm (ready & unleased) leasedReplicas: 2 # scale subresource: specReplicasPath=.spec.maxReplicas ``` **Example: Corellium VirtualTargetClass** ```yaml apiVersion: jumpstarter.dev/v1alpha1 kind: VirtualTargetClass metadata: name: corellium-kronos namespace: jumpstarter spec: provisioner: corellium.jumpstarter.dev credentialsSecretRef: name: corellium-creds # Secret in same namespace bindingMode: WaitForFirstConsumer # provision on lease reclaimPolicy: Delete parameters: api: host: app.corellium.com projectId: "778f00af-5e9b-40e6-8e7f-c4f14b632e9c" device: flavor: kronos os: "1.1.1" build: "Critical Application Monitor (Baremetal)" ``` The Corellium driver (`jumpstarter_driver_corellium.driver.Corellium`) manages the full virtual instance lifecycle through the Corellium REST API — it creates instances on power-on and destroys them on power-off. Device parameters live in `VirtualTargetClass.spec.parameters` and may be overridden per pool via `ExporterSet.spec.parameters` (deep-merged). The provisioner injects API credentials from `VirtualTargetClass.credentialsSecretRef` into the exporter Pod; `ExporterSet` authors never see credentials. **Example: Android ExporterSet** ```yaml apiVersion: jumpstarter.dev/v1alpha1 kind: ExporterSet metadata: name: pixel7-emulator namespace: jumpstarter spec: minReplicas: 0 maxReplicas: 10 minAvailableReplicas: 0 # fully on-demand virtualTargetClassName: android-pixel7 selector: matchLabels: device: pixel7 template: metadata: labels: device: pixel7 os: android api-level: "34" virtual: "true" spec: drivers: - type: jumpstarter_driver_android.driver.AdbDriver - type: jumpstarter_driver_power.driver.EmulatorPower ``` An `ExporterSet` with `minAvailableReplicas: 0` consumes no resources until a lease is requested, accepting cold-start latency. An `ExporterSet` with `minAvailableReplicas: 3` always has 3 ready-to-lease exporters — leases are fulfilled instantly from the warm pool, and the controller scales up if more are needed. ### Container-Backed Targets: Sidecar Pattern For container-backed provisioners (`qemu.jumpstarter.dev`, Android emulator, etc.), the provisioner renders each instance Pod from independently shipped artifacts. The sketch below uses **native sidecar init containers** (`restartPolicy: Always`, [KEP-753](https://github.com/kubernetes/enhancements/issues/753)) as the **proposed** co-location model — **init containers vs. lifecycle hooks** is unresolved; see *Unresolved Questions*. ```yaml # rendered by qemu.jumpstarter.dev provisioner spec: initContainers: - name: exporter # native sidecar (starts first, drains last) restartPolicy: Always image: quay.io/jumpstarter-dev/exporter:latest containers: - name: target-runtime # QEMU/Cuttlefish — independent image image: quay.io/jumpstarter-dev/qemu-runtime:latest volumeMounts: - name: os mountPath: /os - name: shared mountPath: /shared volumes: - name: os image: reference: registry.example.com/os/rpi4:latest # OS as OCI artifact - name: shared emptyDir: {} ``` Benefits: - **Independent release cadence** — exporter, runtime, and OS image version independently. - **Fault isolation** — exporter survives target-runtime crashes and can drain or report failure. - **Standard interfaces** — drivers attach over virtio (serial/SPI/CAN/GPIO) or Unix sockets on shared volumes; same driver code works physical + virtual. - **Unprivileged Pods** — virtio-backed guests avoid privileged containers when the host supports it. The exporter sidecar communicates with the target-runtime container via Unix sockets on a shared `emptyDir` volume (QMP for QEMU control, serial console, launcher socket for dynamic argv). API-backed provisioners (`corellium`, `ec2`) and off-cluster provisioners (`qemu-baremetal.jumpstarter.dev`) skip the in-cluster runtime container — see *External and Off-Cluster Provisioning*. ### User Experience From the user's perspective, virtual scalable exporters appear as regular exporters in the pool. The lease experience is unchanged: ```bash # Lease any rpi4 target — may match physical or virtual jmp lease -l board=rpi4 # Lease explicitly virtual targets jmp lease -l board=rpi4,virtual=true # Prefer ground truth when fidelity matters jmp lease -l board=rpi4,fidelity=full ``` The guiding principle is: **"Get me a target that matches my requirements."** The distinction between physical and virtual is an implementation detail, not a primary concern for the user. Virtual exporters simply appear in the same pool as physical ones, differentiated only by labels. ### End-to-End Flow (QEMU Example) This section walks through a complete **in-cluster QEMU warm-pool** scenario: what each actor does, which CRDs are involved, and how control passes between components. The flow uses only **two admin-configured CRDs** — no per-instance claim resources: | Admin CRD | Role in this flow | | --- | --- | | `VirtualTargetClass` | Backend profile: provisioner, scheduling, nested `parameters` | | `ExporterSet` | Pool scaling, labels, drivers, optional parameter overrides | Everything else (`Exporter`, `Lease`, `Pod`) is created and managed by controllers at runtime. Relationships use a **reference graph** (not a strict ownership tree): ```text VirtualTargetClass ←── referenced by ── ExporterSet │ ▼ Exporter ──► Pod (exporter sidecar + QEMU runtime) ``` Homogeneous QEMU pools configure **`VirtualTargetClass` + `ExporterSet` only**. The provisioner deep-merges parameters, materializes Pods, and registers `Exporter` CRs. **OS images are not pre-selected by the pool** — lessees flash and boot what they need after leasing (see Phase 4 and DD-7). #### Actors | Actor | Component | Responsibility | | --- | --- | --- | | **Administrator** | Human / GitOps | Cluster bootstrap, class + set CRs | | **Jumpstarter operator** | `Jumpstarter` CR | Deploys `jumpstarter-controller`, routers, exporter-set controllers | | **Exporter-set controller** | `qemu.jumpstarter.dev` Deployment | Reconciles `ExporterSet`, creates Exporters/Pods, scales pool | | **Jumpstarter controller** | Existing controller | Assigns `Lease` → `Exporter`, unchanged lease semantics | | **User** | CLI / CI (`jmp lease`, drivers) | Requests leases, flashes images, runs tests | #### Phase 0 — Cluster bootstrap (admin, one-time) **Admin actions:** 1. Install Jumpstarter operator (if not already present). 2. Configure the `Jumpstarter` CR with `spec.exporterSets.provisioners` listing `qemu.jumpstarter.dev` (and any other provisioners). **Controller actions:** - Operator creates the exporter-set controller Deployment (`--provisioner=qemu.jumpstarter.dev`). - Operator ensures `jumpstarter-controller` is running (existing behavior). **Result:** Provisioner controller is watching for `ExporterSet` CRs whose `virtualTargetClassName` references a class handled by that provisioner. #### Phase 1 — Define the virtual target profile (admin, two CRs) **Admin actions:** 1. Create a `VirtualTargetClass` describing the QEMU backend (same namespace as the `ExporterSet` that will reference it): ```yaml apiVersion: jumpstarter.dev/v1alpha1 kind: VirtualTargetClass metadata: name: qemu-rpi4 namespace: jumpstarter spec: provisioner: qemu.jumpstarter.dev bindingMode: Immediate reclaimPolicy: Delete scheduling: nodeSelector: kubernetes.io/arch: arm64 tolerations: - key: jumpstarter.dev/kvm operator: Exists effect: NoSchedule resources: limits: devices.kubevirt.io/kvm: "1" parameters: machineType: virt firmware: url: registry.example.com/firmware/rpi4:latest digest: sha256:abc... resources: cpu: 4 memory: 4Gi storage: 16Gi ``` 2. Create an `ExporterSet` in the **same namespace** that references the class by name and declares scaling + lease-matching labels: ```yaml apiVersion: jumpstarter.dev/v1alpha1 kind: ExporterSet metadata: name: rpi4-virtual namespace: jumpstarter spec: minReplicas: 0 maxReplicas: 20 minAvailableReplicas: 2 scaleDownCooldown: 5m recycleStrategy: ExitAndReplace virtualTargetClassName: qemu-rpi4 parameters: resources: memory: 8Gi selector: matchLabels: board: rpi4 template: metadata: labels: board: rpi4 arch: aarch64 virtual: "true" spec: drivers: - type: jumpstarter_driver_power.driver.QemuPower - type: jumpstarter_driver_network.driver.TcpNetwork config: port: 22 - type: jumpstarter_driver_serial.driver.QemuSerial ``` **User actions:** None. **Controller actions:** None yet — exporter-set controller waits until `ExporterSet` exists and resolves `virtualTargetClassName` to the class above. #### Phase 2 — Warm pool provisioning (exporter-set controller) **Trigger:** `ExporterSet` CR created or updated; `minAvailableReplicas: 2`. **Exporter-set controller actions (reconcile loop):** 1. Resolve `ExporterSet.spec.virtualTargetClassName` to `VirtualTargetClass` `qemu-rpi4` in the same namespace; compute merged parameters (deep-merge of class + set overrides). 2. Count owned `Exporter` CRs: `replicas`, `readyReplicas`, `leasedReplicas`, `availableReplicas` (= ready − leased). 3. If `availableReplicas < minAvailableReplicas` and `replicas < maxReplicas`, scale up by creating new instances. For each new instance: - Create an `Exporter` CR with labels from `spec.template.metadata` and drivers from `spec.template.spec`. - Render a Kubernetes Pod (sidecar pattern): - **Exporter sidecar** (native sidecar, `restartPolicy: Always`) — starts first, registers with `jumpstarter-controller`. - **QEMU runtime container** — baseline virt machine from merged `parameters` (CPU, memory, firmware blob); **empty disk** ready for user flash at lease time. - Exporter talks to runtime via Unix sockets on a shared `emptyDir` (QMP, serial, launcher). - Apply scheduling from `VirtualTargetClass.scheduling` to the Pod. 4. Update `ExporterSet.status` (`replicas`, `readyReplicas`, `availableReplicas`, `leasedReplicas`, conditions). **Jumpstarter-controller actions:** - Accepts exporter registrations from the sidecar processes (existing gRPC flow). - Marks exporters as available for lease assignment when ready. **User actions:** None. **Result:** Two warm exporters appear in the pool, labeled `board=rpi4, virtual=true`. `ExporterSet.status.availableReplicas: 2`. ```text ExporterSet rpi4-virtual ├── Exporter rpi4-virtual-aaa [Ready, unleased] → Pod (exporter + QEMU) └── Exporter rpi4-virtual-bbb [Ready, unleased] → Pod (exporter + QEMU) ``` #### Phase 3 — User requests a lease (user + jumpstarter-controller) **User actions:** ```bash jmp lease -l board=rpi4,virtual=true ``` **Jumpstarter-controller actions:** 1. Create a `Lease` CR with `spec.selector.matchLabels: {board: rpi4, virtual: "true"}`. 2. Scan available `Exporter` CRs matching the selector (enabled, no active `leaseRef`, ready). 3. Pick one (e.g. `rpi4-virtual-aaa`) and set `Exporter.status.leaseRef` to the lease name. 4. Return connection details to the user (existing flow). **Exporter-set controller actions:** - Observes `leasedReplicas` increased, `availableReplicas` decreased. - If `availableReplicas < minAvailableReplicas`, begins scale-up (create another instance to refill the warm buffer). - Does **not** participate in lease assignment. **Result:** User holds an active lease on `rpi4-virtual-aaa`. Pool still maintains warm capacity via background scale-up. #### Phase 4 — User session: flash, boot, test (user + exporter sidecar) The warm pool provides **instant lease assignment**; image selection happens **after** lease — same workflow as a physical bench (DD-7). The pool does not pre-flash an OS onto instances. **User actions** (via leased client): ```python with env() as client: client.storage.flash("/path/to/image.raw") # write disk image client.power.on() # boot QEMU via QemuPower driver client.serial.read() # interact over serial # ... run tests ... ``` **Exporter sidecar actions:** - `storage.flash` writes the image to shared storage (or tells QEMU runtime via QMP/`blockdev-add`). - `power.on` sends QEMU start via QMP or launcher socket on shared volume. - Serial/network drivers proxy to the QEMU runtime container. **Controller actions:** None during the session (lease is held). #### Phase 5 — Lease release and recycle (user + controllers) **User actions:** ```bash jmp delete-lease # or lease TTL expires ``` **Jumpstarter-controller actions:** 1. Clear `Exporter.status.leaseRef` on `rpi4-virtual-aaa`. 2. Mark lease as released. **Exporter-set controller actions:** 1. Observe exporter is unleased; update `availableReplicas` / `leasedReplicas`. 2. Apply `recycleStrategy`: - **ExitAndReplace (default):** exporter sidecar exits after cleanup → Pod terminates → controller deletes `Exporter` CR → creates a fresh replacement with empty baseline storage to maintain `minAvailableReplicas` (next lessee flashes again). - **InPlaceReuse:** exporter resets QEMU state in place → same Pod returns to Ready without restart (lessee may re-flash before next session). 3. If `availableReplicas > minAvailableReplicas` for longer than `scaleDownCooldown`, gracefully scale down an excess replica: - Set `Exporter.spec.enabled: false` - Wait until no lease assigned - Delete Pod + `Exporter` CR **Result:** Pool returns to steady state with `minAvailableReplicas` warm, unleased exporters. #### Phase 6 — Demand spike (scale-up under load) **Trigger:** Three users (or CI jobs) request leases simultaneously; only one warm exporter remains. **User actions:** Three concurrent `jmp lease -l board=rpi4,virtual=true`. **Jumpstarter-controller actions:** - Assigns the one available exporter immediately. - Sets `Pending` condition on the other two leases (existing behavior when no exporter is available). **Exporter-set controller actions:** 1. Sees pending leases matching `spec.selector` with no available exporters. 2. Scales up: creates new `Exporter` + Pod instances (up to `maxReplicas`). 3. As new exporters register and become ready, jumpstarter-controller assigns pending leases. **Result:** Pool grows to meet demand, then shrinks back after cooldown when leases are released. #### Summary: CRDs and runtime objects **Admin-configured (2 CRDs — the full pool definition):** | CRD | Scope | Created by | Observed by | Relationship | | --- | --- | --- | --- | --- | | `VirtualTargetClass` | Namespaced | Admin | Exporter-set controller | Referenced by `ExporterSet` (same namespace) | | `ExporterSet` | Namespaced | Admin | Exporter-set controller | References class; owns runtime objects below | **Platform and runtime (created by controllers):** | Resource | Created by | Observed by | User-visible? | | --- | --- | --- | --- | | `Jumpstarter` | Admin | Operator | No | | `Exporter` | Exporter-set controller | Jumpstarter-controller, exporter-set controller | Indirectly (via lease) | | `Lease` | User (via CLI) | Jumpstarter-controller, exporter-set controller | Yes | | `Pod` | Exporter-set controller | Kubernetes, exporter-set controller | No | #### QEMU vs API-backed vs off-cluster backends The flow above applies to **in-cluster container-backed** provisioners (`qemu.jumpstarter.dev`). Other provisioner strings reuse the same `ExporterSet` + `jumpstarter-controller` lease flow with different placement: | Topology | Example provisioner | Where the target runs | | --- | --- | --- | | In-cluster container | `qemu.jumpstarter.dev` | Pod on Kubernetes (sidecar + runtime) | | API-backed cloud | `corellium.jumpstarter.dev` | External SaaS API; lightweight exporter Pod | | Off-cluster bare metal | `qemu-baremetal.jumpstarter.dev` | QEMU/emulator on lab hosts outside the cluster | For **API-backed** backends: - `VirtualTargetClass` holds `credentialsSecretRef` and shared backend `parameters`. - Per-pool overrides are expressed via `ExporterSet.spec.parameters` (deep-merged over the class). - The exporter Pod is lighter (API client only; no QEMU runtime container). For **off-cluster** backends, see *External and Off-Cluster Provisioning*. The `ExporterSet` + `jumpstarter-controller` lease flow is identical for all topologies. ### Architecture Overview ```text ┌─────────────────────────┐ │ jumpstarter-controller │ │ (creates Leases, │ │ assigns Exporters) │ └──────────┬──────────────┘ │ creates/updates Lease & Exporter objects │ ▼ ┌────────────────────────────────────┐ │ Kubernetes API │ │ (Lease, Exporter, ExporterSet, │ │ VirtualTargetClass) │ └─┬──────────────┬──────────────┬────┘ │ │ │ watches │ watches │ watches │ Leases + │ Leases + │ Leases + │ Exporters │ Exporters │ Exporters │ │ │ │ ┌─────────────────▼┐ ┌───────────▼──────────┐┌──▼──────────────────────┐ │ qemu provisioner │ │ android provisioner │ │ corellium provisioner │ │ (ExporterSet │ │ (ExporterSet │ │ (ExporterSet │ │ controller) │ │ controller) │ │ controller) │ └────────┬─────────┘ └──────────┬──────────┘ └────────────┬────────────┘ │ │ │ │ manages │ manages │ manages ▼ ▼ ▼ ┌──────────────────┐ ┌───────────────────────┐ ┌────────────────────────┐ │ Warm Pool │ │ Warm Pool │ │ Warm Pool │ │ [Exporter].. │ │ [Exporter].. │ │ [Exporter].. │ └────────┬─────────┘ └───────────┬───────────┘ └────────────┬───────────┘ │ │ │ └───────────────────────┼──────────────────────────┘ │ register as standard Exporter CRs ▼ Kubernetes API (Exporters) ``` **Scaling Inputs — Watches on Leases and Exporters:** Each `ExporterSet` controller watches two key resources to make scaling decisions: 1. **Leases** — The controller watches for pending Leases whose label selectors match the set's selector. Pending leases with no available exporter signal demand and trigger scale-up. 2. **Exporters** — The controller watches owned Exporter objects to track which instances are available (no active lease) vs. occupied (leased). This determines the current pool utilization. Together these inputs feed the scaling logic: if there are pending leases that match this set and no available instances to serve them, scale up. If there are excess idle instances beyond `minAvailableReplicas` for a sustained period, scale down. **Per-Provisioner Deployments (single image by default):** All provisioner controllers are compiled into a single binary. Each Deployment in the cluster passes a `--provisioner=` flag to activate the corresponding reconciler (e.g., `qemu.jumpstarter.dev`). This gives each provisioner isolated logs and independent restarts while maintaining a single image to build and release. The Jumpstarter operator deploys provisioner controllers based on the `Jumpstarter` CR configuration. A new `exporterSets` section lists which provisioners to enable: ```yaml apiVersion: operator.jumpstarter.dev/v1alpha1 kind: Jumpstarter metadata: name: jumpstarter namespace: jumpstarter spec: # ... existing controller, routers, authentication config ... exporterSets: image: quay.io/jumpstarter-dev/exporter-set-controller:latest imagePullPolicy: IfNotPresent provisioners: - name: qemu.jumpstarter.dev enabled: true - name: corellium.jumpstarter.dev enabled: false image: quay.io/jumpstarter-dev/exporter-set-controller-corellium:latest ``` **Scaling Logic:** Each `ExporterSet` controller monitors its instances and scales based on available (unleased) replicas: - If `availableReplicas` drops below `minAvailableReplicas`, scale up. - If `availableReplicas` exceeds demand for a cooldown period, scale down (never below `minAvailableReplicas`). - Never exceed `maxReplicas` (if set; 0 or omitted means no upper bound). - `kubectl scale exporterset/ --replicas=N` works via the `scale` subresource (`specReplicasPath=.spec.maxReplicas`). **Instance Lifecycle:** 1. `ExporterSet` controller creates an `Exporter` from the set template (provisioner renders the Pod). 2. The Pod starts the virtual target (sidecar pattern for container backends, or API call for external backends) and runs the Jumpstarter exporter, registering with the controller like any other exporter. 3. The instance becomes available in the pool for lease assignment. 4. When a lease is released, the exporter handles cleanup/reset per `recycleStrategy`. The instance returns to the available pool or is replaced. ### API / Protocol Changes **New CRDs** | CRD | Scope | Role | | --- | --- | --- | | `VirtualTargetClass` | Namespaced | Backend profile — provisioner, credentials, scheduling, binding, nested `parameters` | | `ExporterSet` | Namespaced | Generic scaling resource (ReplicaSet + HPA analog) | **Reference rule:** `ExporterSet.spec.virtualTargetClassName` must name a `VirtualTargetClass` in the **same namespace**. Cross-namespace references are rejected at admission. `credentialsSecretRef.name` must refer to a Secret in that same namespace. **VirtualTargetClass (common fields):** ```yaml spec: provisioner: # e.g. qemu.jumpstarter.dev credentialsSecretRef: # optional; for API-backed provisioners name: # Secret in same namespace as this class parameters: # nested YAML object; provisioner-specific : bindingMode: Immediate | WaitForFirstConsumer reclaimPolicy: Delete | Retain scheduling: # inherited by rendered exporter Pods nodeSelector: : nodeAffinity: { ... } tolerations: [ ... ] resources: limits: devices.kubevirt.io/kvm: "1" ``` **ExporterSet (common fields):** ```yaml spec: minReplicas: # floor (default: 0) maxReplicas: # ceiling (0 or omitted = no limit) minAvailableReplicas: # warm buffer: ready & unleased (default: 0) scaleDownCooldown: # default: 5m recycleStrategy: ExitAndReplace | InPlaceReuse virtualTargetClassName: # VirtualTargetClass name in same namespace parameters: # optional nested overrides (deep-merged with class) : selector: matchLabels: : template: metadata: labels: { ... } spec: drivers: [ ... ] ``` ### Dictionary-Based Parameters Both `VirtualTargetClass` and `ExporterSet` expose a `spec.parameters` field carrying provisioner-specific configuration as a **nested YAML object** (maps, lists, and scalars) — not a flat `map[string]string`. This reads like normal exporter/driver config rather than CSI's intentionally opaque string map. **CRD representation:** The field is schemaless at the API level (`type: object` with `x-kubernetes-preserve-unknown-fields: true`, or `apiextensionsv1.JSON` in Go). OpenAPI does not validate nested structure at `kubectl apply` time. **Validation:** The active provisioner validates merged parameters during reconcile and sets `ExporterSet` status conditions on error. Optional future: `VirtualTargetClass.spec.parametersSchemaRef` pointing to a JSON Schema ConfigMap per provisioner. **Merge semantics:** When provisioning an instance, the controller computes: ```text mergedParameters = deepMerge(VirtualTargetClass.spec.parameters, ExporterSet.spec.parameters) ``` - **Maps** merge recursively — set keys override class keys at the same path. - **Scalars and lists** in `ExporterSet.spec.parameters` replace the class value at that path entirely (lists are not concatenated). **Example:** ```yaml # VirtualTargetClass.spec.parameters resources: cpu: 4 memory: 4Gi storage: 16Gi firmware: url: registry.example.com/firmware/rpi4:v1 digest: sha256:abc... # ExporterSet.spec.parameters (override memory only) resources: memory: 8Gi # mergedParameters passed to provisioner resources: cpu: 4 # inherited from class memory: 8Gi # overridden by set storage: 16Gi # inherited from class firmware: # unchanged — set did not specify firmware url: registry.example.com/firmware/rpi4:v1 digest: sha256:abc... ``` **Status subresource (ExporterSet):** ```yaml status: replicas: 5 readyReplicas: 3 availableReplicas: 1 # warm (ready & unleased) leasedReplicas: 2 conditions: - type: SetHealthy status: "True" - type: ScalingLimited status: "False" ``` **Scale subresource:** `specReplicasPath=.spec.maxReplicas` enables `kubectl scale` and HPA/KEDA interoperability. **Pluggable provisioners:** ```text VirtualTargetClass.provisioner → qemu.jumpstarter.dev → k8s Pod (sidecar + runtime container) qemu-baremetal.jumpstarter.dev → QEMU on off-cluster lab hosts (SSH/API) ec2.jumpstarter.dev → AWS API corellium.jumpstarter.dev → Corellium REST API # backend is pluggable via provisioner string ``` **Changes to existing CRDs:** **Exporter — new `enabled` field:** Exporters gain an `enabled` boolean field (default: `true`). When set to `false`, the `jumpstarter-controller` will not assign new leases to this exporter. This is useful for: - **Lab operations:** Temporarily taking a physical exporter offline for maintenance without deleting it. - **Graceful scale-down:** `ExporterSet` controllers set `enabled: false` before terminating an instance, ensuring the controller doesn't race to assign a lease to an exporter that is about to be deleted. ```yaml apiVersion: jumpstarter.dev/v1alpha1 kind: Exporter metadata: name: qemu-rpi4-instance-3 spec: enabled: false # Controller will not assign new leases to this exporter ``` The graceful scale-down sequence becomes: 1. `ExporterSet` controller sets `enabled: false` on the target exporter. 2. Controller waits to confirm no lease was assigned (watches for `status.leaseRef` to remain empty). 3. Controller deletes the Pod and Exporter CR. ### Hardware Considerations This proposal is specifically designed to reduce reliance on physical hardware for scalable testing. However: - Virtual targets must faithfully emulate the interfaces exposed by physical hardware (serial, network, storage, power) through the existing driver model. - Container-backed provisioners require `/dev/kvm` or equivalent; scheduling is expressed on `VirtualTargetClass.scheduling`. - Timing-sensitive tests (USB/IP latency, boot ROM timeouts) may behave differently on virtual targets — the system should expose labels indicating whether a target is physical or virtual so users can filter when fidelity matters. ### External and Off-Cluster Provisioning Provisioners are **not** limited to in-cluster Pods. The same `VirtualTargetClass` + `ExporterSet` model applies whether the virtual target runs as a Kubernetes Pod, on a cloud virtual-device API, or on **bare-metal lab hosts** outside the cluster. `VirtualTargetClass.provisioner` selects the backend implementation; `credentialsSecretRef` and nested `parameters` carry everything the provisioner needs to reach remote infrastructure (API tokens, SSH keys, host lists, board profiles). **Design intent:** Scale a **logical pool** of exporters through familiar `ExporterSet` semantics while placing workloads where fidelity or hardware requires it — e.g. a high-fidelity automotive emulator that needs bare-metal KVM, GPU passthrough, or vendor-specific tooling unavailable in the cluster. **What stays the same:** - Users lease with labels (`jmp lease -l board=sa8295,fidelity=high`) — no awareness of placement. - Each pool member registers as a standard `Exporter` CR with `jumpstarter-controller`. - Lessees flash and boot images via existing drivers after lease (see DD-7). **What differs per provisioner:** - **In-cluster (`qemu.jumpstarter.dev`):** exporter-set controller creates Pod + sidecar; scheduling from `VirtualTargetClass.scheduling`. - **API-backed (`corellium.jumpstarter.dev`):** exporter Pod is a thin API client; cloud device lifecycle managed externally. - **Off-cluster (`qemu-baremetal.jumpstarter.dev`):** exporter-set controller provisions exporter + QEMU (or vendor emulator) on remote hosts via SSH or a lab agent API; may run exporter as a local process on the host rather than a Pod. The controller still owns `Exporter` CRs in the cluster for lease assignment. **Automotive example — Qualcomm reference board on bare metal:** An automotive team runs SA8295-class targets on dedicated lab servers for higher-fidelity behavior than in-cluster QEMU. The cluster hosts orchestration only; emulators run on the bench network. ```yaml apiVersion: jumpstarter.dev/v1alpha1 kind: VirtualTargetClass metadata: name: qcom-sa8295-baremetal namespace: jumpstarter spec: provisioner: qemu-baremetal.jumpstarter.dev credentialsSecretRef: name: automotive-lab-ssh bindingMode: Immediate parameters: hosts: - name: bench-01.automotive.example.com arch: aarch64 slots: 2 # concurrent instances per host - name: bench-02.automotive.example.com arch: aarch64 slots: 2 runtime: binary: /usr/bin/qemu-system-aarch64 kvm: true board: soc: sa8295 --- apiVersion: jumpstarter.dev/v1alpha1 kind: ExporterSet metadata: name: qcom-sa8295-hifi namespace: jumpstarter spec: minReplicas: 0 maxReplicas: 4 minAvailableReplicas: 1 virtualTargetClassName: qcom-sa8295-baremetal parameters: board: fidelity: high # deep-merged over class board defaults selector: matchLabels: board: sa8295 fidelity: high virtual: "true" template: metadata: labels: board: sa8295 fidelity: high virtual: "true" spec: drivers: - type: jumpstarter_driver_power.driver.QemuPower - type: jumpstarter_driver_network.driver.TcpNetwork config: port: 22 - type: jumpstarter_driver_serial.driver.QemuSerial ``` **Provisioner actions (off-cluster):** 1. Read merged `parameters` and `credentialsSecretRef`. 2. Select a host with free capacity (`slots`). 3. Deploy or attach exporter + runtime on the host (SSH, systemd, or lab agent). 4. Create an `Exporter` CR in the cluster with template labels; register with `jumpstarter-controller`. 5. On scale-down or failure, tear down the remote instance and delete the `Exporter` CR. Physical reference boards on the same lab network can coexist in the pool — users distinguish them with labels (`virtual=false` vs `virtual=true`) without changing the lease workflow. ## Design Decisions ### DD-1: Pool-based scaling vs. purely on-demand provisioning **Alternatives considered:** 1. **Pool-based with configurable min/max** — Maintain a warm pool of pre-spawned instances; scale between `minAvailableReplicas` and `maxReplicas`. 2. **Purely on-demand** — Spawn a new instance only when a lease request arrives; destroy it when the lease is released. **Decision:** Pool-based with configurable min/max. **Rationale:** Purely on-demand provisioning introduces noticeable latency for CI pipelines (Pod scheduling + image pull + VM boot + exporter registration typically takes 10-15s, and up to 60s with cold image pulls or heavy provisioners). A warm pool provides instant lease fulfillment for the common case. Setting `minAvailableReplicas: 0` still allows purely on-demand behavior for rarely-used targets. `VirtualTargetClass.bindingMode: WaitForFirstConsumer` maps to on-demand provisioning; `Immediate` maps to warm pools. ### DD-2: Provisioner controller deployment model **Alternatives considered:** 1. **Separate binary per provisioner** — Each provisioner is a completely independent binary/image. 2. **Single binary, one deployment per provisioner** — One image contains all provisioner reconcilers; a CLI flag (`--provisioner=qemu.jumpstarter.dev`) selects which one to activate. 3. **Single binary, single deployment** — One Deployment runs all provisioners. 4. **Integrated into jumpstarter-controller** — Add reconcilers directly into the existing operator. **Decision:** Option 2 — single binary, one Deployment per provisioner. **Rationale:** A single image is cheaper to build, test, and productize. Deploying as separate Deployments gives operational benefits: isolated logs, independent restarts, and explicit `--provisioner` selection. Adding a new backend means adding a Deployment manifest with a different flag — no new image build required. ### DD-3: Pluggable provisioner vs. CRD-per-pool vs. typed claims **Alternatives considered:** 1. **CRD per provider pool** (`QEMUExporterPool`, `AndroidExporterPool`, etc.) — provider typing at the pool CRD level. 2. **Generic `ExporterSet` + pluggable `VirtualTargetClass.provisioner` + nested `parameters`** — orchestration generic; backend selected by provisioner string; device config as nested YAML on class + set (deep-merge). 3. **Typed `*VirtualTarget` CRDs per provider** (`QEMUVirtualTarget`, `CorelliumVirtualTarget`, etc.) — strong schema per backend, referenced from `ExporterSet`. 4. **Fully generic opaque config** — single CRD with flat `provider.config` map. **Decision:** Option 2 — generic `ExporterSet` + pluggable provisioner on `VirtualTargetClass` with **dictionary-based nested `parameters`**. Reject options 1 and 3. **Rationale:** Separating orchestration (scaling, lease matching, graceful shutdown) from provisioning (QEMU container, Corellium API, off-cluster hosts) lets each provisioner implement backend-appropriate scaling while exposing an identical scaling surface (`minReplicas`/`maxReplicas`/`minAvailableReplicas`). Nested `parameters` on `VirtualTargetClass` and optional `ExporterSet` overrides replace per-provider claim CRDs — homogeneous pools need only two admin CRDs. Typed `*VirtualTarget` claims add maintenance overhead without benefit when pools share one backend profile (2026-06 team review). New backends add a provisioner string and parameter conventions, not pool-tier or claim-kind changes. ### DD-4: Per-lease parameters vs. pool flavors **Alternatives considered:** 1. **Per-lease `parameters` dictionary** — Leases carry opaque hints (CPU, memory, storage) interpreted by provisioners. 2. **Multiple `ExporterSet` flavors** — Administrators create separate sets for different resource profiles; users select via label matching. **Decision:** Option 2 — multiple set flavors via separate `ExporterSet` CRs. **Rationale:** Per-lease parameters add complexity across every layer for a use case already satisfied by separate sets with different labels and `VirtualTargetClass` parameters. Per-lease parameters can be revisited in a future JEP if needed. ### DD-5: Built-in scaling vs. HPA / KEDA **Alternatives considered:** 1. **Built-in scaling logic** — Each provisioner implements lease-aware reconciliation with a consistent scaling API. 2. **Kubernetes HPA** — Horizontal Pod Autoscaler with custom metrics. 3. **KEDA** — Event-driven autoscaler with a custom Jumpstarter scaler. **Decision:** Option 1 — built-in scaling logic with consistent API surface; HPA/KEDA as complementary via `scale` subresource and exposed metrics. **Rationale:** Each provisioner should implement autoscaling appropriate to its backend (local container churn vs. EC2 quotas vs. external API rate limits). A single generic autoscaler cannot express lease-aware matching, graceful disable-before-delete, or `minAvailableReplicas` invariants. However, the **same scaling vocabulary** (`minReplicas`/`maxReplicas`/`minAvailableReplicas`) and the `scale` subresource apply across all provisioners — one mental model for users, backend-specific logic underneath. Pool metrics for HPA/KEDA are listed in *Future Possibilities*. ### DD-6: VirtualTargetClass vs. inline credentials **Alternatives considered:** 1. **Inline credentials in every `ExporterSet`** — simple but duplicates secrets across pools sharing the same backend account. 2. **`VirtualTargetClass` (namespaced backend profile)** — class in the same namespace as the referencing `ExporterSet` holds credentials, nested `parameters`, and scheduling; `ExporterSet.spec.virtualTargetClassName` references the class by local name. 3. **Separate `ProviderConfig` CRD** — lighter-weight credential sharing without full class semantics. **Decision:** Option 2 — **namespaced** `VirtualTargetClass` with optional future `ProviderConfig` for multi-account credential reuse. **Rationale:** Unlike CSI `StorageClass` (cluster-scoped), `VirtualTargetClass` is **namespaced** so teams define isolated backend profiles, credentials, and scheduling per namespace without cluster-admin involvement. `ExporterSet` may only reference a class in the **same namespace**; `credentialsSecretRef` points to a Secret in that namespace — credentials never appear on `ExporterSet`. `bindingMode` and `reclaimPolicy` still map to warm-pool vs. on-demand and external target retention. The StorageClass/PVC *separation of class and consumer* is retained; only scope differs. ### DD-7: Instance TTL and image refresh (deferred) **Alternatives considered:** 1. **`ExporterSet.spec.ttl` with image refresh** — declarative `maxAge`, `maxIdleAge`, and `imageRefreshPolicy` on the pool CRD; controller recycles instances and re-pulls container/firmware images to keep warm pools fresh. 2. **Manual / CronJob pool flush** — operators restart pools or delete Pods on a schedule outside Jumpstarter. 3. **Admin-pinned images in `parameters`** — declare expected OS/firmware refs on `VirtualTargetClass` / `ExporterSet`; provisioner always boots those images. 4. **User flash at lease time (v1)** — warm pool instances are provisioned with baseline runtime only; the lessee flashes and boots the image they want via existing drivers (`storage.flash`, power cycle) — same workflow as physical targets. 5. **Separate lifecycle controller (future)** — a cross-cutting controller that periodically visits **physical and virtual** exporters and flashes the expected image, without virtual-only fields on `ExporterSet`. **Decision:** Reject options 1–3 for v1 — **no TTL, image-refresh, or admin-pinned boot images on `ExporterSet` / `VirtualTargetClass`**. Option 4 matches current Jumpstarter behavior: users flash and boot what they need after leasing. Option 5 remains the preferred direction for automated image hygiene later. **Rationale:** Time-based Pod recycle and provisioner-driven image re-pull are virtual-pool mechanics that **physical exporters do not share**. Physical machines have no `maxAge`; their OS changes when someone flashes them, not when a pool controller rotates Pods. Putting TTL or pinned boot images on `ExporterSet` alone would split the lease experience. In v1, virtual targets in the warm pool behave like physical benches: the lessee selects and flashes the desired image. A future **separate lifecycle controller** can watch `Exporter` resources regardless of origin and apply uniform policies — e.g. periodic flash of a lab-defined expected image to idle exporters, scheduled maintenance windows — combining long-lived (non-refreshed) exporter instances with automated image updates when operators choose to enable them. ## Design Details ### Reconciliation Loop Each `ExporterSet` controller runs a continuous reconciliation loop, triggered by changes to the set CR, owned Exporters, or matching Leases: ```text for each ExporterSet CR: mergedParameters = deepMerge(class.parameters, set.parameters) ownedExporters = list Exporters owned by this CR replicas = count ownedExporters in Ready state leasedReplicas = count ownedExporters with an active LeaseRef availableReplicas = replicas - leasedReplicas pendingLeases = count pending Leases matching spec.selector # Invariant: maintain minAvailableReplicas warm buffer if availableReplicas < spec.minAvailableReplicas AND replicas < spec.maxReplicas: scale up to restore availableReplicas # Demand-driven scale-up elif pendingLeases > 0 AND replicas < spec.maxReplicas: scale up by min(pendingLeases, spec.maxReplicas - replicas) # Scale-down: excess idle replicas elif availableReplicas > spec.minAvailableReplicas AND cooldown elapsed: graceful scale down: 1. set exporter.spec.enabled = false 2. wait until leaseRef remains empty 3. delete Pod and Exporter CR (never below minAvailableReplicas) ``` ### Instance States Each virtual exporter instance transitions through: ```text Provisioning → Ready (warm pool) → Leased → Ready └→ Terminating → (deleted if available>min) ``` - **Provisioning:** Pod starting, virtual target provisioning, exporter registering. - **Ready:** Exporter registered and available for lease. - **Leased:** Exporter assigned to an active lease. - **Terminating:** Instance being deleted (scale-down or failure replace). ### Component Interaction 1. Administrator creates `VirtualTargetClass` and `ExporterSet` resources. 2. The provisioner controller provisions `minAvailableReplicas` Exporters. 3. Each instance Pod boots the virtual target and runs the Jumpstarter exporter, registering with the existing `jumpstarter-controller`. 4. Instances appear as regular exporters with labels from `spec.template.metadata`. 5. Users lease them normally — the existing controller handles assignment. 6. On lease release, the instance is recycled per `recycleStrategy`: - **Exit-and-replace (default):** Exporter exits; controller replaces the instance proactively to maintain `minAvailableReplicas`. - **In-place reuse:** Exporter resets internal state without exiting; Pod remains running and transitions back to Ready immediately. 7. The `ExporterSet` controller continuously monitors utilization and scales. ### Failure Modes - **Pod crash:** Controller detects failure via Pod status, replaces the instance, maintains `minAvailableReplicas` invariant. - **Resource exhaustion:** Cannot scale beyond cluster capacity; set stays at current size, new leases queue as for physical targets. - **Provisioner startup failure:** Instance marked failed, controller retries with backoff, alerts via conditions on the set status. - **Scaling storm:** Rate limiting on scale-up prevents creating too many instances simultaneously. ## Test Plan ### Unit Tests Unit tests should meet the project test coverage requirements. ### Integration Tests - End-to-end lease lifecycle with QEMU provisioner in a test cluster - Mixed physical/virtual lease orchestration - Provisioner failure and recovery scenarios - Parameter deep-merge and provisioner-side validation - `VirtualTargetClass` credential injection ## Acceptance Criteria - [ ] `VirtualTargetClass` and `ExporterSet` CRDs defined - [ ] `ExporterSet` controller maintains `minAvailableReplicas` warm buffer - [ ] Controller scales up when available pool is depleted (up to `maxReplicas`) - [ ] Controller scales down idle replicas after cooldown (never below `minAvailableReplicas`) - [ ] QEMU provisioner (`qemu.jumpstarter.dev`) fully implemented and tested - [ ] Virtual instances register as standard exporters and are leasable without changes to the existing lease flow - [ ] Pod failures detected and reported in `ExporterSet` status - [ ] An `ExporterSet` with `minAvailableReplicas: 0` provisions on demand only - [ ] Status subresource reports Deployment-style counters and health conditions - [ ] `scale` subresource enables `kubectl scale` interoperability - [ ] `parameters` deep-merge produces correct merged config for provisioner - [ ] Provisioner validates merged `parameters` and surfaces errors via conditions - [ ] Documentation covers `VirtualTargetClass` and `ExporterSet` configuration ## Graduation Criteria ### Experimental - QEMU provisioner functional in a development cluster - Basic set lifecycle works end-to-end (scale up, lease, release, scale down) - Community feedback on CRD schema and scaling behavior ### Stable - QEMU reference provisioner (`qemu.jumpstarter.dev`) production-ready; at least one additional topology validated (e.g. off-cluster bare metal or API-backed) - Production usage by at least one team for >1 month - Performance benchmarks documented (cold-start latency, scaling responsiveness) - Provisioner authoring guide published (how to add a new provisioner) ## Backward Compatibility - Existing physical-only workflows are unaffected; lease requests without virtual-specific labels continue to work as before. - No changes to the existing gRPC protocol for physical exporters. - New CRDs (`VirtualTargetClass`, `ExporterSet`) are additive. - **Exporter `enabled` field:** Defaults to `true`, so all existing Exporters continue to behave exactly as before. - Administrators upgrading see no behavior change until they explicitly deploy `ExporterSet` and `VirtualTargetClass` resources. ## Consequences ### Positive - **Instant lease fulfillment:** Warm pools eliminate provisioning latency. - **Elastic scaling:** Sets grow and shrink with demand. - **Unified user experience:** Virtual and physical targets leased the same way. - **Kubernetes-native UX:** `minReplicas`/`maxReplicas`/`minAvailableReplicas`, Deployment-style status, `kubectl scale` — familiar to cluster admins. - **Pluggable backends:** New provisioners add a provisioner string. - **Credential separation:** `VirtualTargetClass` keeps secrets off `ExporterSet` resources. - **Fidelity ladder:** Same lease flow across sim, cloud virtual, and hardware tiers. ### Negative - **Increased CRD surface:** `VirtualTargetClass` and `ExporterSet` add more resources to manage than a single pool CRD per provider. - **Resource consumption:** Warm pools consume cluster resources when idle. - **Sidecar complexity:** Container-backed provisioners require multi-container Pod orchestration and shared-volume protocols. ### Risks - **Scaling storms:** Burst demand could exhaust cluster resources; rate limiting mitigates but may delay lease fulfillment. - **Provisioner reliability:** Failed startups can cause crash-replace loops. ## Rejected Alternatives - **Static fixed-size pools (status quo):** Cannot scale with demand. - **External orchestration (Terraform/Ansible):** Breaks lease semantics integration. - **Per-lease `parameters` dictionary:** See DD-4. - **CRD-per-pool without VirtualTarget separation:** Couples scaling and provider config; rejected in favor of generic `ExporterSet` + pluggable provisioner. - **Typed `*VirtualTarget` CRDs per provider:** Rejected at 2026-06 team review; see DD-3. Dictionary `parameters` on class + set suffice for homogeneous pools. - **`ExporterSet.spec.ttl` and image refresh:** Rejected for v1; see DD-7. Would create virtual-only lifecycle semantics unlike physical exporters. ## Prior Art - **LAVA:** Virtual DUTs via QEMU with static configuration; no on-demand scaling. - **Crossplane:** General-purpose cloud composition; no Jumpstarter lease semantics. Useful reference for external API integration (e.g., Corellium) but does not replace pool-specific scaling logic. - **CSI (StorageClass/PVC):** Class/consumer separation adopted; scope is namespaced rather than cluster-scoped (see DD-6). - **KubeVirt:** VM orchestration with pre-mounted images; Jumpstarter differs by flash-at-runtime model and exporter-as-sidecar pattern. ## Unresolved Questions - What is the exact scaling algorithm (proportional, step-based, predictive)? - **Pod initialization for container-backed provisioners:** Native sidecar init containers (`restartPolicy: Always`, KEP-753) vs. lifecycle hooks vs. other co-location patterns for exporter + target-runtime. The sidecar sketch in this JEP is provisional; resolve in the QEMU provisioner implementation PR. ### Resolved - **Observability (JEP-0013):** Provisioner controllers emit metrics per JEP-0013. - **Lease release detection:** Controllers watch Lease objects directly. - **Scheduled leases:** `Spec.BeginTime` on Lease CRs; controllers ignore future-dated leases until effective. ## Future Possibilities The following extensions are explicitly **not** part of this JEP but the model stays open to them: - **Disaggregated/cross-node accelerators** — ARM64 runtime bridged to a remote GPU via virtio-gpu/RDMA. - **Separate `ProviderConfig` CRD** — multi-account credential reuse and rotation referenced by multiple `VirtualTargetClass` resources. - **Realized-instance CRD (PV analog)** — for static/pre-provisioned devices that exist outside the dynamic provisioning flow. - **`ExporterDeployment` rollout tier** — Deployment analog for rolling updates across pool instances (versioned template changes). - **Multiple/spawned-on-lease VirtualTargets per Exporter** — composite benches and multi-device topologies. - **Universal physical+virtual `Target` abstraction** — single resource type spanning hardware and virtual backends. - **Priority selectors / DeviceClass** — ordered label fallback ("prefer hardware, fall back to QEMU") at lease time. - **HPA/KEDA metric exposure** — complementary external autoscaling once core provisioner controllers are stable. - **Renode provider** — `renode.jumpstarter.dev` provisioner leveraging JEP-0010. - **Additional cloud/container provisioners** — `corellium.jumpstarter.dev`, `android.jumpstarter.dev`, `ec2.jumpstarter.dev` (no typed claim CRDs). - **Composite leases** — multiple exporters linked into one logical lease. - **Cross-cutting lifecycle controller** — periodic flash of lab-defined expected images to idle **physical and virtual** exporters (see DD-7); long-lived pool instances combined with optional automated image updates, not virtual-only TTL on `ExporterSet`. ## Implementation Plan The implementation is broken into phases. Each phase delivers a usable increment and can be merged independently. **v1 focuses on the QEMU reference implementation**; additional provisioners and lifecycle automation are deferred. | Phase | Scope | Status | | --- | --- | --- | | 1 | Exporter `enabled` field | Near-term | | 2 | `VirtualTargetClass` + `ExporterSet` CRDs; nested `parameters`; `qemu.jumpstarter.dev` | Near-term (v1) | | 3 | External/off-cluster provisioning (`qemu-baremetal.jumpstarter.dev`) | Near-term | | 4+ | Lifecycle controller, Corellium/Android, etc. | Deferred — see *Future phases* | ### Phase 1: Exporter `enabled` field Add the `enabled` boolean field to the Exporter CRD and update the `jumpstarter-controller` lease assignment logic to skip disabled exporters. **Deliverables:** - [ ] Add `spec.enabled` field to Exporter CRD (default: `true`) - [ ] Update lease assignment in `jumpstarter-controller` to filter out disabled exporters - [ ] Unit tests for the filtering logic - [ ] Integration test: disable an exporter, verify it gets no new leases ### Phase 2: Core CRDs and QEMU reference provisioner Define namespaced `VirtualTargetClass` and `ExporterSet` CRDs. Implement **only** the `qemu.jumpstarter.dev` in-cluster provisioner — the reference implementation for the 2-CRD model, parameter deep-merge, warm pool, and flash-at-lease workflow (DD-7). **Deliverables:** - [ ] Define `VirtualTargetClass` and `ExporterSet` CRD schemas (namespaced; nested `parameters` with schemaless object fields; same-namespace reference rule) - [ ] Implement parameter deep-merge and provisioner-side validation - [ ] Implement exporter-set controller binary with `--provisioner=qemu.jumpstarter.dev` - [ ] Sidecar Pod rendering (provisional init-container model — see Unresolved Questions) - [ ] Core scaling logic: `minAvailableReplicas`, demand-driven scale-up, graceful scale-down - [ ] Deployment-style status + `scale` subresource - [ ] Watch Leases and Exporters for scaling decisions - [ ] Add `exporterSets` section to `Jumpstarter` operator CR - [ ] Integration test: deploy `ExporterSet`, lease, flash, boot, release, observe scaling ### Phase 3: External / off-cluster provisioning Extend the exporter-set controller with an off-cluster QEMU provisioner to validate the pluggable backend model beyond in-cluster Pods. Documents and implements the flow in *External and Off-Cluster Provisioning*. **Deliverables:** - [ ] `qemu-baremetal.jumpstarter.dev` provisioner (or equivalent off-cluster stub) using the same binary with `--provisioner=qemu-baremetal.jumpstarter.dev` - [ ] Remote host selection, SSH/agent deploy, and `Exporter` CR registration from off-cluster instances - [ ] Example `VirtualTargetClass` + `ExporterSet` manifests for lab bare-metal (automotive profile) - [ ] Integration test or documented manual test plan for off-cluster scale-up and lease ### Future phases (deferred) The following are **explicitly out of v1** scope. They reuse the same `VirtualTargetClass` + `ExporterSet` CRDs and nested `parameters` — no typed claim CRDs. **Additional provisioners** - [ ] `corellium.jumpstarter.dev` — API-backed cloud virtual devices - [ ] `android.jumpstarter.dev` — in-cluster Android emulator pools - [ ] `ec2.jumpstarter.dev` — AWS-backed targets - [ ] Provisioner authoring guide **Cross-cutting lifecycle controller (DD-7)** - [ ] Separate controller for periodic flash / maintenance on **physical and virtual** exporters — not `ExporterSet.spec.ttl` ## Implementation History - 2025-10-30: RFE filed upstream (GitHub #41) - 2026-06-03: JEP proposed - 2026-06-18: Revised per review — ExporterSet, VirtualTargetClass, pluggable provisioner model; added end-to-end flow section - 2026-06-18: Team review — dictionary `parameters`, removed typed VirtualTarget CRDs, namespaced `VirtualTargetClass`, deferred TTL (DD-7) ## References - [GitHub Issue #41: RFE: On-Demand Virtual Target Provisioning](https://github.com/jumpstarter-dev/jumpstarter/issues/41) - [PITCREW-409: jumpstarter JEP: virtual scalable exporters](https://redhat.atlassian.net/browse/PITCREW-409) - [JEP-0010: Renode Integration](JEP-0010-renode-integration.md) — Related provider - [JEP-0013: Observability](JEP-0013-observability-telemetry-logs.md) — Integration point --- This JEP is licensed under the [Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0),