Bucket Sync

anycloud can sync cloud storage (S3, Azure Blob, GCS) with your containers automatically. Configure buckets in your profile and they're mounted directly into your container — no setup code needed.

Three Bucket Types

📥 Input Bucket

Downloaded once on startup, mounted read-only. Use for training data, models, datasets. You must create the bucket before deploying.

Python
CLI

job = ac.submit(
    "ghcr.io/acme/my-app:latest",
    input=ac.bucket("my-training-data"),
)

anycloud submit ghcr.io/acme/my-app:latest \
  --credentials my-aws \
  --input-bucket my-training-data

In your container:

# Read-only at /mnt/input
model = open('/mnt/input/model.pkl', 'rb').read()

📤 Output Bucket

Uploads continuously every ~60 seconds, plus one final flush while the job is finalizing before it becomes completed or errored. Use for results, logs, artifacts. Output buckets are auto-created by the server if they don't exist.

Python
CLI

job = ac.submit(
    "ghcr.io/acme/my-app:latest",
    output=ac.bucket("my-results"),
)

anycloud submit ghcr.io/acme/my-app:latest \
  --credentials my-aws \
  --output-bucket my-results

In your container:

# Write at /mnt/output — syncs to cloud every ~60s and once more at job exit
with open('/mnt/output/output.json', 'w') as f:
    json.dump(result, f)

💾 Checkpoint Bucket

Downloaded once on startup, then uploaded continuously every ~60 seconds. Auto-created per deployment — you don't configure it. Always mounted at /mnt/checkpoint.

On startup, any existing checkpoint data is downloaded from the bucket (e.g., from a previous preempted run). Then a continuous upload syncs local changes to the bucket every ~60 seconds. At job exit, spot jobs enter finalizing and run one final checkpoint upload before reaching completed or errored. This means your container can read previous checkpoints and write new ones.

Use for state that needs to survive preemption (see Spot Instances).

# Read + write at /mnt/checkpoint (always this path)
checkpoint = json.loads(open('/mnt/checkpoint/state.json').read())
checkpoint['epoch'] += 1
with open('/mnt/checkpoint/state.json', 'w') as f:
    json.dump(checkpoint, f)

Combining Buckets

Python
CLI

job = ac.submit(
    "ghcr.io/acme/my-training:latest",
    gpu="a100:8",
    input=ac.bucket("training-data"),
    output=ac.bucket("results"),
)

anycloud submit ghcr.io/acme/my-training:latest \
  --credentials my-aws \
  --gpu-type a100 \
  --spot \
  --gpus all \
  --input-bucket training-data \
  --output-bucket results

This gives your container:

Path	Access	Sync
`/mnt/input`	Read-only	Once on startup
`/mnt/output`	Read-write	Upload every ~60s
`/mnt/checkpoint`	Read-write	Download once on startup, upload every ~60s

Input buckets must exist before deploying. Output and checkpoint buckets are auto-created if they don't exist.

Once a bucket-backed job reaches completed, final output and checkpoint writes have been flushed. If that final flush cannot complete after retries, the deployment is marked failed instead of completed.

Which Bucket Type to Use

Input — data your job reads but never writes (training data, config, models)
Output — data your job produces (results, logs, artifacts)
Checkpoint — state that needs to survive preemption (training progress, recovery data)

Credentials

Bucket access uses cloud-native authentication (IAM roles, managed identities, service accounts) — no keys or secrets in your container. If your storage is in a different cloud account, point each bucket at its own credentials — in Python by sourcing the bucket from a second Client; on the CLI with per-bucket credential and region flags:

Python
CLI

import anycloud

compute = anycloud.Client(credentials="compute-account")  # where the job runs
storage = anycloud.Client(credentials="storage-account")  # where the bucket lives (another cloud)

# Sourcing the bucket from the storage client wires its credentials + region automatically
job = compute.submit(
    "ghcr.io/acme/my-app:latest",
    input=storage.bucket("shared-data"),
)

anycloud submit ghcr.io/acme/my-app:latest \
  --credentials compute-account \
  --input-storage-credentials storage-account \
  --input-storage-region us-east-1 \
  --input-bucket shared-data

Input and output storage credentials are independent. If your output bucket is also on a different cloud, set --output-storage-credentials / --output-storage-region too — otherwise the output bucket falls back to the compute credentials.

Cross-cloud storage currently supports AWS S3 only. anycloud mints scoped, short-lived STS credentials for the specific bucket and writes those credentials to the VM's rclone environment file; it does not copy your long-lived AWS key to the VM. The STS token lasts up to 36 hours and is not refreshed mid-run, so cross-cloud output or checkpoint sync can fail for jobs that keep running past that limit. Resubmitting the job mints fresh credentials.

Unpinned Compute Credentials

--credentials pins compute to one saved credential. If you omit it, anycloud leaves compute credentials unpinned and chooses the cheapest valid target across saved named cloud credentials. Unpinned compute also requires an unpinned region; pass --credentials when you need --region.

Input and output bucket storage does not become a credential pool. With unpinned compute, user-supplied buckets require explicit --input-storage-credentials or --output-storage-credentials so the bucket identity stays stable even when compute lands on another cloud.

A few things still bite even with this simpler model:

vmType aliasing. A vmType alias like h100 resolves to different concrete instance types per cloud. Unpinned compute can land a different SKU than the one you priced for.
Region selection. Omit --region for unpinned compute. If you need EU residency or a specific location, pin both --credentials and --region.

Three Bucket Types​

📥 Input Bucket​

📤 Output Bucket​

💾 Checkpoint Bucket​

Combining Buckets​

Which Bucket Type to Use​

Credentials​

Unpinned Compute Credentials​