Job Lifecycle

anycloud submit
     │
     ▼
┌─────────────┐
│   queued    │  ⏳ Waiting for a provisioning slot or spend cap
└──────┬──────┘
       │
       ▼
┌─────────────┐
│ provisioning │  🏗️ Creating VMs
└──────┬──────┘
       │
       ▼
┌──────────────┐
│ initializing │  ⚙️ SSH + packages + setup
└──────┬───────┘
       │
       ▼
┌──────────────┐
│ downloading  │  📥 Docker image pull
└──────┬───────┘
       │
       ▼
┌─────────────┐
│   syncing   │  💾 Bucket download (if configured)
└──────┬──────┘
       │
       ▼
┌─────────────┐
│  starting   │  🚀 Container start + health check (+ git clone for @function)
└──────┬──────┘
       │
       ▼
┌─────────────┐
│   running   │  ⚡ Job is executing
└──────┬──────┘
       │
       ▼
┌─────────────┐
│ finalizing  │  🏁 Final output/checkpoint upload (if configured)
└──────┬──────┘
       │
   ┌───┴───┐
   │       │
   │    exit ≠ 0
   │       ▼
   │  ┌─────────┐
   │  │ errored │  🪲
   │  └─────────┘
   │
exit = 0
   ▼
┌───────────┐
│ completed │  ✅
└───────────┘

For jobs using the @anycloud.function() decorator, the container's startup command clones your GitHub repo at the submitted commit before running your function. This happens during the starting phase — your Docker image only needs dependencies, not your code. See Deployment Workflows.

Provisioning Concurrency

anycloud caps the deploy pipeline at 50 deployments simultaneously setting up VMs per local API instance. A deployment counts against this cap while it's in provisioning, initializing, downloading, syncing, or starting — i.e. between dispatch and reaching running.

Submit more than 50 at once and the excess sits in queued until a slot frees. Slots free as soon as a deployment reaches running (not when the job completes), so throughput is roughly 50 / pipeline_duration new VMs per minute.

This cap throttles the deploy pipeline, not the running fleet. There is no upper bound on how many jobs can be in running simultaneously.

After a workload exits, bucket-backed jobs may enter finalizing while anycloud flushes final output/checkpoint artifacts. That post-run final sync does not count against the provisioning concurrency cap.

A deployment can also sit in queued because a spend control — throttle or budget — is at its cap. The block reason shows up in anycloud status and anycloud list; the deployment auto-dispatches on the next scheduler tick once the cap clears.

Container Image Pulls

AnyCloud resolves the requested container image to its registry digest, boots a provider base VM, and pulls that image before bucket sync and container start. Mutable tags remain safe: pushing new content behind the same tag produces a different digest and the deployment pulls the new content.

Error States

Any step can fail along the way:

❌ Failed — infrastructure/system error after retries, including setup failures and exhausted final artifact sync
🪲 Errored — your app exited non-zero after final artifacts were flushed
🚫 Invalid — bad config (wrong bucket name, etc.) — never retried, fix and resubmit
🪦 Terminated — user-initiated via anycloud terminate

Final Artifact Sync

For jobs with an output bucket, or spot jobs with a checkpoint bucket, anycloud runs one final rclone copy after the workload exits. completed and errored are published only after that final sync succeeds. If final sync fails, the deployment stays in finalizing and keeps the VM alive while anycloud retries. If those retries are exhausted, the deployment becomes failed because anycloud could not preserve the run artifacts.

Auto-Retry

When an infrastructure error occurs during any setup step (provisioning through starting), anycloud automatically retries. The failed VM is cleaned up, and the job is re-queued for a fresh attempt:

(any setup step fails) → 🛠️ Retrying → ⏳ Queued → 🏗️ Provisioning → ... → ⚡ Running

After 100 failed setup attempts, the job is marked ❌ Failed. Final artifact sync has its own retry budget and stays in finalizing until it succeeds or exhausts.

Quota Recovery

When the cloud denies capacity for the requested VM family (a quota or capacity error), the behavior depends on whether the region is pinned:

Multi-region config (optimizer picks the region) — anycloud blocks that (cloud, vmType, region) target for 30 minutes and keeps the deployment in 🛠️ Retrying. The optimizer skips the blocked target and tries a different region on the next attempt.
Pinned region (region set explicitly) — capacity errors still keep the deployment in 🛠️ Retrying, but anycloud can only retry the same region. It cannot fail over to healthier regions until the region pin is removed.

Regions where the quota is literally 0 or where the subscription can't deploy at all (AWS region not opted in, Azure sponsorship/student/policy-restricted locations) get blocked for 6 hours instead of 30 minutes — these don't come back without a support ticket, so the optimizer rotates off them faster.

To raise the limit yourself, use the CLI:

anycloud quota request <vmType> --credential <name>
anycloud quota status --credential <name>

Dedup is built in: re-running quota request against a region with an existing open case returns SKIPPED along with the prior case's URL. Once the case is approved (or partially approved), the next retry — or a fresh resubmit — picks up the new limit.

Credential Recovery

When the cloud rejects a credential mid-provisioning (expired token, revoked key, auth-class error), the behavior depends on whether compute credentials are pinned:

Pinned credential — the deployment enters 🚫 Invalid immediately. Retrying with the same broken credential won't help.
Unpinned compute — anycloud blocks the failing credential for 30 minutes and re-runs target resolution against saved named credentials, keeping the deployment in 🛠️ Retrying when another candidate can run. Quota and capacity errors continue to rotate at the (cloud, vmType, region) level.

Spot Recovery

Spot instances can be preempted by the cloud provider at any time during initializing, downloading, syncing, starting, or running. anycloud detects preemption automatically and re-provisions from scratch:

(preempted) → 🔁 Recovering → ⏳ Queued → 🏗️ Provisioning → ... → ⚡ Running

Provisioning Concurrency​

Container Image Pulls​

Error States​

Final Artifact Sync​

Auto-Retry​

Quota Recovery​

Credential Recovery​

Spot Recovery​