Use a Docker image and run it on a GPU VM, managed batch service, or job runner. Kubernetes is useful for shared platforms and long-running services, but most batch GPU jobs just need to start a GPU, run a container, save outputs, and shut down.
The job-runner model
Many GPU workloads are batch jobs. They do not need service discovery, ingress, rolling deploys, or always-on nodes. They need:
- A container image with CUDA, runtime compatibility, and dependencies
- A machine shape with the right GPU and memory
- A command to run
- A way to get input data in
- A way to persist outputs and checkpoints
- Logs, status, retry, and cleanup
You can build that with a small script around cloud VMs, a managed service such as AWS Batch, Google Cloud Batch, or Azure Batch, or a higher-level GPU job runner.
What Kubernetes adds
Kubernetes can run GPU workloads. Official Kubernetes docs describe GPU scheduling through device plugins, and Kubernetes Jobs are the native API for one-off tasks that run to completion.
The cluster also adds operational surface area: node pools, autoscaling, device plugin installation, driver compatibility, quotas, storage classes, image pulls, observability, cost controls, access policy, upgrades, and failure debugging. That work may be worth it for a platform team. It may be overhead for a small team trying to run portable GPU jobs.
Alternatives to Kubernetes
- Direct VM: launch a GPU VM, SSH in or use cloud-init, run Docker, upload results, and shut the VM down.
- Managed batch: use AWS Batch, Google Cloud Batch, or Azure Batch. These handle queues and compute pools, but each cloud has its own configuration model.
- Container services: ECS and similar systems can run GPU containers without Kubernetes, though they still require provider-specific setup.
- Portable job runner: submit one Docker image and let a tool handle VM provisioning, file movement, status, and shutdown.
Where anycloud fits
anycloud is a portable job runner. You submit a Docker image, GPU target, command, and optional buckets. anycloud provisions compute in your connected cloud account, runs the container, exposes bucket-backed folders such as /mnt/input and /mnt/output, reports job status, and shuts down compute when the job finishes.