Bucket Sync Internals

How bucket sync to mounted files in the container works under the hood.

Data Flow

Input Bucket (s3:/azureblob:/gcs:)
    ↓ rclone sync (download once on startup)
VM Bucket Directory ($HOST_ROOT/input)
    ↕ Docker bind mount :ro
Container (/mnt/input) - READ ONLY

Output Bucket (s3:/azureblob:/gcs:)
    ↑ rclone copy (upload only, continuous every ~60s)
VM Bucket Directory ($HOST_ROOT/output)
    ↕ Docker bind mount
Container (/mnt/output) - WRITE

Checkpoint Bucket (s3:/azureblob:/gcs:{deployment-id})
    ↓ rclone sync (download once on startup)
    ↑ rclone copy (upload only, continuous every ~60s)
VM Bucket Directory ($HOST_ROOT/checkpoint)
    ↕ Docker bind mount
Container (/mnt/checkpoint) - READ + WRITE

$HOST_ROOT is /mnt for existing deployments and non-Azure VMs. New Azure VMs use /var/lib/anycloud/deployments on the managed OS disk, while the paths visible inside the container stay unchanged.

Zero-Copy Access

Containers access synced files via Docker bind mounts — files exist once on the VM disk and containers see them directly. No duplication, no copying.

-v $HOST_ROOT/input:/mnt/input:ro         # input (read-only)
-v $HOST_ROOT/output:/mnt/output           # output
-v $HOST_ROOT/checkpoint:/mnt/checkpoint   # checkpoint

Sync Strategies

Input: Download Once

rclone sync bucket → VM runs once on startup. No background service — the data is static. When an Azure VM restarts with its managed OS disk intact, rclone reconciles the existing directory and does not transfer unchanged files again.

Output: Upload Only

A background service runs rclone copy VM → bucket every ~60 seconds. Uses copy (not sync) so multiple jobs can safely write to the same bucket without deleting each other's files.

After a same-VM restart, anycloud uploads local output that has not reached the bucket before it resumes the workload and the continuous uploader.

When the workload exits, bucket-backed jobs enter finalizing while anycloud runs one final rclone copy before it marks the job completed or errored. This closes the window where a result written near process exit could otherwise wait for the next background sync cycle.

Checkpoint: Download Once, Then Upload

On a fresh or replacement VM, rclone sync bucket → VM restores existing checkpoint data (e.g., from a previous preempted run). After a same-VM restart, anycloud keeps the managed-disk checkpoint and uploads any local progress before resuming the workload. A background service then runs rclone copy VM → bucket every ~60 seconds to upload changes.

Spot checkpoint buckets get the same final upload guarantee as output buckets. If the final upload fails, the deployment stays in finalizing and the VM remains live while anycloud retries. If retries are exhausted, the deployment becomes failed.

VM Initialization Sequence

anycloud creates the VM with an SSH-only startup script.
Once SSH is reachable, it installs Docker and the rclone sync service.
rclone is configured for each bucket type, then waits for cloud-storage connectivity (with retries that cover Azure managed-identity propagation delays).
The container starts with bind mounts for all configured buckets.

Cloud-Native Authentication

anycloud uses cloud-native auth — no keys or secrets in containers:

AWS — IAM instance roles with S3 bucket policy
Azure — Managed identities with Storage Blob Data Contributor role
GCP — Service accounts with Storage Object Admin role

Credentials are automatically available to rclone via cloud metadata services.

AWS S3

Buckets use the names you specify
Access is scoped per deployment via bucket tags, not a per-deployment IAM role
Bucket cleanup uses GetBucketLocation to find the bucket's region — this handles cases where a spot recovery moved the deployment to a different region than where the bucket was originally created

Azure Blob Storage

On Azure, your bucket is a blob container inside an anycloud-managed storage account that is shared across your subscription — so a job in any region under that subscription reads and writes the same data. Your bucket name is the container name.

GCP Cloud Storage

GCS buckets use the names you specify
A service account with the Storage Object Admin role provides access

Data Flow​

Zero-Copy Access​

Sync Strategies​

Input: Download Once​

Output: Upload Only​

Checkpoint: Download Once, Then Upload​

VM Initialization Sequence​

Cloud-Native Authentication​

AWS S3​

Azure Blob Storage​

GCP Cloud Storage​