Cloud GPU answers

Direct answers to the questions that come up when you run GPU jobs in the cloud — what a job really costs, spot interruptions, purchase models, Docker images, and how much infrastructure you actually need.

How do you compare GPU prices across cloud providers?

Compare GPU jobs by completed-run cost, not only hourly price. Region, quota, capacity, spot risk, storage, and startup time all change the real number.

Can you train on spot GPUs without losing progress?

Spot GPUs can work for training if the job writes durable checkpoints and can restart cleanly after the VM is reclaimed.

How do you run GPU jobs without Kubernetes?

For many batch workloads, a Docker image plus a GPU VM, managed batch service, or job runner is simpler than operating a GPU cluster.

Should I use spot, on-demand, or reserved GPUs?

Use spot for resumable work, on-demand while usage is uncertain, and Capacity Blocks when a predictable paid GPU window matters.

Can one Docker image run across AWS, Azure, GCP, and Lambda?

One image can work across GPU clouds when it targets linux/amd64, keeps cloud-specific settings out of the image, and matches driver/runtime constraints.

Do you need a custom Docker image for GPU training?

Often no — public PyTorch and CUDA images cover most training jobs. Build your own only when dependencies are not available off the shelf.

Have a question that isn't here?

Ask in the community Slack