Bucket Sync Internals
How bucket sync to mounted files in the container works under the hood.
Data Flow
Input Bucket (s3:/azureblob:/gcs:)
↓ rclone sync (download once on startup)
VM Directory (/mnt/{inputBucket})
↕ Docker bind mount :ro
Container (/mnt/{inputBucket}) - READ ONLY
Output Bucket (s3:/azureblob:/gcs:)
↑ rclone copy (upload only, continuous every ~60s)
VM Directory (/mnt/{outputBucket})
↕ Docker bind mount
Container (/mnt/{outputBucket}) - WRITE
Checkpoint Bucket (s3:/azureblob:/gcs:{deployment-id})
↓ rclone copy (download once on startup)
↑ rclone copy (upload only, continuous every ~60s)
VM Directory (/mnt/checkpoint)
↕ Docker bind mount
Container (/mnt/checkpoint) - READ + WRITE
Zero-Copy Access
Containers access synced files via Docker bind mounts — files exist once on the VM disk and containers see them directly. No duplication, no copying.
-v /mnt/input:/mnt/input:ro # input (read-only)
-v /mnt/output:/mnt/output # output
-v /mnt/checkpoint:/mnt/checkpoint # checkpoint
Sync Strategies
Input: Download Once
rclone sync bucket → VM runs once on startup. No supervisor service — the data is static.
Output: Upload Only
A supervisord service runs rclone copy VM → bucket every ~60 seconds. Uses copy (not sync) so multiple jobs can safely write to the same bucket without deleting each other's files.
Checkpoint: Download Once, Then Upload
On startup, rclone copy bucket → VM downloads existing checkpoint data (e.g., from a previous preempted run). Then a supervisord service runs rclone copy VM → bucket every ~60 seconds to upload changes. This is one-way upload — the same strategy as output buckets.
Supervisord Configs
Output bucket:
[program:rclone-output]
command=/bin/bash -c "while true; do \
/usr/bin/rclone copy /mnt/output ${remotePath} --verbose \
--fast-list --no-update-modtime --modify-window 1s \
--log-file=/var/log/rclone-output.log && \
sleep 60; done"
autostart=true
autorestart=true
startsecs=5
startretries=999
Checkpoint bucket:
[program:rclone-checkpoint]
command=/bin/bash -c "while true; do \
/usr/bin/rclone copy /mnt/checkpoint ${remotePath} \
--verbose --fast-list --no-update-modtime --modify-window 1s && \
sleep 60; done"
autostart=true
autorestart=true
startsecs=5
startretries=999
VM Initialization Sequence
- API creates VM with SSH-only startup script
- After SSH is accessible,
initializeVM()installs Docker and supervisord initRclone()installs rclone and configures each bucket type- Connectivity check uses
--retries 30 --retries-sleep 15s(covers Azure MSI propagation delays up to 7.5 min) - Container starts with bind mounts for all configured buckets
Cloud-Native Authentication
anycloud uses cloud-native auth — no keys or secrets in containers:
- AWS — IAM instance roles with S3 bucket policy
- Azure — Managed identities with Storage Blob Data Contributor role
- GCP — Service accounts with Storage Object Admin role
Credentials are automatically available to rclone via cloud metadata services.
Continent-Based Storage
Storage is organized by continent to balance data sharing with latency and egress costs:
| Scenario | Latency | Egress Cost |
|---|---|---|
| Within continent | 10-30ms | ~$0.01-0.02/GB |
| Cross-continent | 70-150ms | ~$0.05-0.12/GB |
Continent Codes
| Code | Regions |
|---|---|
us | US East, US West, US Central |
eu | West Europe, North Europe, UK, France, Germany |
apac | East Asia, Southeast Asia, Japan, Korea, India |
aus | Australia East/Southeast |
latam | Brazil, Mexico, Chile |
mea | UAE, South Africa, Israel |
ca | Canada Central/East |
First-wins location: The storage location is set by whichever region deploys first within that continent. Subsequent deploys in other regions of the same continent reuse the existing storage.
AWS S3
- Buckets use user-specified names, tagged with
continentmetadata - One IAM role per deployment:
anycloud-{deploymentId} - Bucket cleanup uses
GetBucketLocationto find the bucket's region — this handles cases where a spot recovery moved the deployment to a different region than where the bucket was originally created - Continent validation: deploying to a different continent than an existing bucket fails with an error
Error: Bucket 'my-results' already exists in continent 'us' but you are deploying to continent 'apac'.
Please use a different bucket name in anycloud.yaml.
Resolution: use a different bucket name or deploy to a region in the same continent.
Azure Blob Storage
Azure uses storage accounts with containers inside them:
Azure Subscription
├── anycloud-eastus (resource group) ← VMs
├── anycloud-westus (resource group) ← VMs
└── anycloud-storage-us (resource group) ← Shared storage for all US regions
└── anycloudus12345678 (storage account)
└── my-bucket (container)
- Storage account:
anycloud{continent}{subscriptionPrefix}(max 24 chars) - Resource group:
anycloud-storage-{continent} - Storage accounts are accessible via URL from any region within the continent
- Race condition on simultaneous creates is handled (Azure rejects duplicate names globally)
GCP Cloud Storage
- GCS bucket per continent:
anycloud-{continent}-{project-prefix} - Service account with Storage Object Admin role