Often no. If a public image such as pytorch/pytorch or nvidia/cuda already has the runtime your job needs, run it directly and bring your code at run time. Build and push a custom image only when the job needs dependencies no off-the-shelf image provides.
What the image is actually for
A training image has one job: provide the runtime. That means the operating system, the CUDA user-space libraries, and the ML framework — not necessarily your code. Your code can arrive separately when the container starts:
- Baked into the image with a Dockerfile COPY step
- Cloned from git when the container starts
- Installed as a package at startup
- Read from storage mounted into the container
Separating the runtime from the code is what makes stock images viable: the image changes only when dependencies change, not on every commit.
When a public image is enough
Most training and fine-tuning jobs import a framework, read data, and write checkpoints. A stock framework image already covers that, and skipping the build means no build-push loop on every change, no registry credentials to manage, and faster iteration. A public image is enough when:
- Your dependencies import cleanly inside a stock framework image
- The job is a script or module the image can run as a command
- The code is reachable at run time, from git or from storage
- You need no system packages beyond what the image ships
When to build your own
- System dependencies: the job needs OS packages, custom drivers for data loading, or tools no public image ships.
- Compiled extensions: CUDA extensions or native code that must be compiled into the environment ahead of time.
- Strict reproducibility: every dependency pinned and frozen, so the same image re-runs identically months later.
- No network at startup: the container cannot fetch code or packages at run time, so everything must be baked in.
If you do build, build for the platform the GPU VM runs: linux/amd64. On Apple Silicon use docker buildx --platform linux/amd64, since a plain build publishes an arm64 image that pulls successfully but cannot start on an x86 VM. Start from a CUDA or framework base image, and prefer building in CI so each image is tied to the commit that built it.
Where anycloud fits
anycloud runs any pullable image on a cloud GPU VM — public images from any registry, private images through GHCR. It does not build images: build and push first, then submit the image reference with a command.
With the Python SDK, the @anycloud.function() decorator takes this further: point it at a stock framework image and anycloud clones your code from git onto the VM at run time. You rebuild only when dependencies change, not on every code change.