Skip to main content

Changelog

0.1.18

Auto-retry jobs whose container can't see the GPU

If a GPU job's container exits with CUDA_ERROR_NO_DEVICE in its logs — the host booted without visible GPUs — the deployment now retries on a fresh VM instead of being marked errored. The previous behavior required a manual anycloud resubmit for every occurrence. The retry uses the standard failed-deploy backoff and exhaustion limits.

Notifications digest skips the first-run backfill

The first time the digest monitor runs after enabling notifications on a host, it marks today as the starting point and posts nothing. Previously the next 15-minute tick would post a half-empty digest of whatever happened to be in the DB before notifications were configured. Your first real digest now lands the morning after your first full UTC day with notifications enabled.

Interactive prompts for budget, throttle, notifications

Running anycloud budget set, anycloud throttle set, or anycloud notifications enable slack with no arguments now prompts for the missing values (cap, window, webhook URL) when stdin is a terminal. Flag-only invocations behave the same as before, and piped or CI usage (no TTY, or CI=true) still exits with the previous error instead of hanging.

0.1.17

status now returns the full VM history (breaking)

/v1/status and anycloud status --json now return vms (every VM ever associated with the deployment, terminated rows included) instead of vmStatuses (active only). Each entry has a new endedAt field — null for live VMs, a millisecond timestamp for terminated ones. Live VMs still carry the container and rclone health-check fields; terminated rows are provisioning metadata only. This lets you read prebakedImageId, region, and cloudId for VMs after they've been cleaned up — previously those were only visible during the deployment's running window.

The human-formatted anycloud status output is unchanged by default (terminated VMs are hidden). Pass --verbose to see them, marked (terminated). The Python SDK's status_response.vm_statuses is now status_response.vms and includes terminated rows; SDK helpers like Job.exec() and Job.logs() filter to live VMs automatically.

If you have scripts that grep for vmStatuses in status --json output, rename to vms and add select(.endedAt == null) if you want the old "active VMs only" semantics.

Spend controls: anycloud throttle and anycloud budget

Two new commands cap spend by blocking new dispatches when usage exceeds a limit. Running jobs are never killed — only queued work pauses.

  • anycloud throttle set 20 — burn-rate cap ($/hr across in-flight VMs, plus the candidate). Set --agent-session to cap each agent run independently instead of account-wide.
  • anycloud budget set 100 --per day — calendar-window cap (day, week, or month). Same --agent-session toggle.

Blocked deployments stay queued with a reason shown inline in anycloud list and anycloud status, and auto-dispatch once running VMs end (throttle), the calendar resets (budget), or you raise the cap. See Spend Controls for the full surface.

anycloud db query / db schema for read-only DB inspection

Two new commands let you (or an agent) inspect the local API database without leaving the CLI. anycloud db query "<sql>" runs a read-only SELECT / WITH / EXPLAIN / PRAGMA — writes are refused at the SQLite engine level — and emits rows as JSON with --json. anycloud db schema [table] introspects tables, columns, foreign keys, and indexes; pair the JSON form with db query to answer "which deployments are stuck and why" without poking the SQLite file directly.

anycloud cost switches to real billed cost

anycloud cost <id> now shows the real cost pulled from your cloud provider's billing API once the bill settles (~24 hours after a deployment ends). Until then — or if your account doesn't have billing-read permissions — it falls back to the catalog estimate as before, marked (est.).

To enable this, the cloud credentials anycloud uses now need a small additional permission:

  • AWS: add ce:GetCostAndUsage to the IAM policy. Activate the anycloud_deployment cost-allocation tag once (Billing → Cost Allocation Tags → activate, or aws ce update-cost-allocation-tags-status --cost-allocation-tags-status TagKey=anycloud_deployment,Status=Active).
  • Azure: grant the service principal the built-in Cost Management Reader role on the subscription.
  • GCP: enable Cloud Billing BigQuery export with resource-level detail, grant the service account roles/bigquery.dataViewer on the billing dataset, and pass the dataset to anycloud setup gcp.

Without these, nothing breaks — anycloud cost keeps showing the estimate.

anycloud notifications enable slack — daily Slack digest of usage

The API container can now post a once-a-day summary to a Slack channel you choose, aggregated from your local deployment history (~/.anycloud/api.db) and sent directly from your machine to the webhook.

anycloud notifications enable slack --webhook https://hooks.slack.com/... posts a test message before saving so a typo'd webhook is caught immediately. The digest covers total spend, count by terminal state, preemption rate, median runtime, and active users. anycloud notifications status / test slack / disable slack round out the surface. Setup details in docs/guides/notifications.md.

0.1.16

MCP secrets tools

Three new MCP tools — secrets_list, secrets_create, secrets_delete — let an agent manage named secrets (env-var bundles used via secrets=[...] on submit). Mirrors the existing CLI secrets new/list/delete. secrets_delete accepts force=True to override the safeguard that blocks delete while non-terminal deployments still reference the secret.

CLI catalog introspection

Four new commands surface the same cloud catalog the Python SDK already exposed, so you can pick a region or VM type before you submit:

  • anycloud regions <provider> [--vm-type <type>] [--spot] — list available regions. With --vm-type, narrows to regions that offer that VM type, sorted cheapest first.
  • anycloud vm-types <provider> <region> [--accelerator <gpu>] — list VM types in a region, optionally filtered to ones with a specific accelerator.
  • anycloud gpus <provider> [--type <gpu>] — list available GPUs, or available counts for a specific GPU when --type is given.
  • anycloud pricing <provider> <vm-type> [--region <r>] [--spot] — on-demand or spot price per region.

All four accept --json for scripting and accept the provider name case-insensitively (aws, AWS, Aws all work).

anycloud bucket upload / download

New CLI commands stream files in and out of an S3, GCS, or Azure Blob bucket using a credential you've already registered with anycloud credentials new:

anycloud bucket upload my-bucket ./data.bin data/in.bin --credentials prod-aws
anycloud bucket download my-bucket data/out.bin ./out.bin --credentials prod-aws --region us-east-1

The same routes back the Python SDK's Bucket.upload() / Bucket.download(), so the CLI and SDK now go through one code path. As part of that switch, the Python SDK no longer needs cloud filesystem libraries — the [aws], [gcp], and [azure] extras have been removed from anycloud-sdk because s3fs, gcsfs, and adlfs are no longer used. Bucket I/O now requires the credential to be registered with anycloud credentials new (it could previously work with inline CloudConfig credentials passed to Client(...)); construct your client with Client(credentials="<name>") instead.

anycloud cost --json for scripts and agents

anycloud cost now accepts --json. Without an id, it emits the aggregate report { periodMs, totalCost, totalDurationMs, totalJobs, deploymentsWithoutPrice, byProvider }; with an id, it emits the per-deployment fields (id, image, cloudProvider, region, vmType, spot, totalCost, durationMs, ratePerHour). Previously the only output was a formatted table, so anything calling anycloud cost from a script had to grep currency strings out of stdout.

anycloud list gains exact-match filters and machine-friendly output modes

anycloud list now supports --status <state>, --provider <cloud>, --csv, and --only-ids in addition to the existing substring -f, --filter and --json. Multiple predicates AND together. --csv emits a header row plus one row per deployment (id,state,cloudProvider,region,vmType,image,spot,startedAt,completedAt,totalCost), useful for piping into spreadsheets or awk. --only-ids emits one deployment ID per line and skips the table, useful for anycloud list --only-ids --status running | xargs anycloud terminate and similar batch operations.

--json output is now compact

Every --json flag in the CLI (list, status, cost, credentials list, config list, quota) now emits a single line of compact JSON instead of pretty-printed JSON with two-space indents. Equivalent payload, ~30% fewer bytes. jq . and any other JSON parser handle it identically.

0.1.15

Failed provisioning attempts no longer orphan cloud resources or pollute anycloud list / cost

When a VM provisioning attempt failed partway through (for example, Azure created the NIC and Public IP but the VM create itself errored), the rows were marked "ended" in anycloud's bookkeeping even though the cloud-side resources were still there — slowly eating the 100-Public-IP-per-region Azure quota until new submits started failing on quota errors. The same rows also showed up as live VMs in anycloud list and accrued duration in anycloud cost. Cleanup now waits for the cloud to confirm each resource is gone before marking it ended (the cleanup monitor keeps retrying anything the cloud rejected), and failed-attempt rows are filtered out of the list and cost aggregates.

status always renders the deployment header

anycloud status <id> no longer falls back to ❓ unknown with a "deployment not found in deployments list" hint when the id is older than the picker window or otherwise missing from the recent list. The metadata (image, cloud, region, VM type, age) now comes back with the status response itself, so the header is correct on the first call and the second round-trip is gone.

Security: deployment IDs are no longer probeable across users

Knowing another user's deployment id no longer lets you fetch their SSH key (status), terminate their job (terminate), or resubmit it (resubmit). All three now reject cross-user ids with the same "deployment not found" response they return for ids that genuinely don't exist, so an authenticated caller can't enumerate other users' deployments either. Pre-existing pings against your own deployments are unchanged.

0.1.14

anycloud cost now works for jobs submitted without a pinned region

anycloud cost reported $0.00 for jobs submitted without an explicit region — the common case, since omitting a region enables multi-region failover. The hourly rate is now captured on each VM at provision time (when the region is finally known) and cost is summed across the deployment's VMs, instead of relying on a single rate stored at submit. Multi-region retries that land in different-priced regions are now reflected accurately in the total.

0.1.13

Multi-region Azure submits stop retrying regions the subscription can't use

When Azure rejects a region with a "not permitted for subscription" or "not available for resource group" error (e.g. Jio India regions on most commercial subscriptions), the deployment now skips that region for the rest of its lifetime instead of cycling back to it every few minutes. Previously, multi-region jobs could exhaust their retry budget bouncing between forbidden regions and never reach the regions the subscription actually has access to.

0.1.12

--persist-bucket retains the spot checkpoint bucket without keeping the VM

anycloud submit --spot --persist-bucket skips the auto-deletion of the checkpoint bucket on job completion. Independent of --persist — the common shape is an ephemeral VM (no --persist) plus a retained bucket, so a later anycloud resubmit <id> brings a fresh VM up against the same checkpoint contents without paying for an idle one in between. The flag is a no-op (with a warning) on non-spot deployments since they have no checkpoint bucket. Cleanup is manual: aws s3 rb s3://<id> --force or the equivalent for your provider — the existing orphan reaper leaves retained buckets alone because they aren't tagged as ended.

0.1.11

Ordered fallback list for credentials

The CLI / API now accepts an ordered list of named credentials. Submit a job with --credentials aws-prod --credentials gcp-prod (repeatable; flag stays singular for single-cred users) or list them under credentialsName: [...] in anycloud.yaml. The provisioning scheduler walks the list on auth failure: cred N hits an auth error, gets blocked for 30 minutes, and the next attempt picks cred N+1 instead of the deployment terminating in Invalid. Quota / capacity errors continue to rotate at the (cloud, vmType, region) level — they don't trigger cred rotation. Each VM is tagged with the credential it actually booted with; the deployment row keeps the curated allow-list.

Submit-time validation enforces the gotchas before a VM is launched:

  • Input bucket must exist on every listed cred's storage when no explicit inputStorageCredentials is set — otherwise the submit returns 400 naming the cred and cloud where the bucket is missing. With an explicit storage cred, the bucket is checked once on that cred's storage and any cross-cloud egress is the user's call.
  • Output bucket follows the same logic but with an all-or-none rule: it must exist on every listed cred's cloud or on none of them. Partial state returns 400 with the per-cloud breakdown so the user can decide whether to drop the cred or pre-create the bucket.
  • Per-cred capability check runs validateCloudCapabilities against every listed cred — catches things like spot: true against a Lambda cred up front.

0.1.10

Ordered fallback pools for vmType and gpuType

Both fields now accept an array as well as a single string. The provisioning scheduler expands every entry into the candidate pool — for gpuType, each alias is resolved into all interchangeable instance types — and falls over in order when the primary is quota-blocked or unavailable. CLI flags --vm-type / --gpu-type are now repeatable on submit, serve, build, config new, and config edit; anycloud.yaml profiles accept a list under either field. The config new wizard now uses a multi-select picker (Space to mark, Enter to confirm — selection order is fallback order). The two fields are still mutually exclusive (only one of vmType / gpuType may be set, but each can itself be a list). The original list is preserved on the deployment row for display and retries.

quota request always asks for one more instance of headroom

anycloud quota <vmType> now files a ticket every time you run it (unless an in-flight request is already pending for the same family + region). The previous "only when exhausted" trigger missed the common fresh-subscription case — limit=0, usage=0 looked fine but the user couldn't launch a single VM. The new target is max(currentUsage + vCPUsPerVM, currentLimit × 2, vCPUsPerVM) — at minimum, one more instance's worth of capacity, with a doubling floor on top so re-running keeps growing the ceiling. AWS applies the same rule with currentUsage treated as 0 (Service Quotas doesn't expose live usage).

If every surveyed region already has a pending ticket for the family, the tail reads No new tickets filed — every surveyed region already has a pending quota request for <vmType> on <cloud>. instead of No actions taken..

quota picks credentials, not clouds

quota request and quota status now prompt for a credential rather than a cloud — credential implies cloud, so the second picker round-trip is gone. With a single AWS or Azure credential the picker is silent; with multiple, an autocomplete labelled <name> (<cloud>) appears. --credential <name> still bypasses the picker. Credentials for clouds without quota support (GCP, Lambda, Local) are filtered out and surface a clear error instead of running the picker only to fail later.

Breaking: --cloud removed from quota subcommands

anycloud quota --cloud <aws|azure> no longer accepts the flag on either request or status. Cloud is now derived from the resolved credential or, for request, the vmType prefix. Pass --credential <name> to pin to a specific account.

0.1.9

Picker fixups

  • Restore -a/--all on status (regressed when the picker came back) so the picker can widen past the last 24h.
  • --json now refuses the picker rather than dumping its stdout chrome into the JSON pipe.

Multi-region quota request / quota status

--region is now optional on both subcommands. Omit it to fan out across every region in the cloud's catalog (AWS narrows to regions where the vmType is offered; Azure narrows to regions where the family is exhausted). Per-region calls run in parallel via Promise.allSettled, so wall-clock matches a single-region invocation. Re-runs stay idempotent thanks to the existing per-region dedup.

When the fan-out hits an AWS region that is disabled-by-default (e.g. me-central-1, ap-east-1, af-south-1), anycloud submits the AWS EnableRegion call for it. Opt-in is asynchronous — the quota request for that region is deferred and the entry comes back as SUBMITTED with optInAction: enabling. Pass --region if you'd rather not opt anything new in.

Interactive GPU instance picker

Run anycloud quota request with no vmType and no --gpu in a TTY to get an interactive cloud picker followed by an autocomplete list of GPU instance types from the catalog (e.g. g6e.48xlarge — 8× L40S (192 vCPUs)). Non-TTY / --json flows are unchanged: an explicit vmType or --gpu is still required.

0.1.8

Smarter blocks for disabled regions

When a provision fails because the region's quota is 0 or the subscription can't deploy there at all (AWS OptInRequired, Azure ProvisioningDisabled / LocationIsOfferRestricted), the optimizer blocks that region for 6 hours instead of the 30-minute generic quota block — so retries stop hammering regions that won't come back without a support ticket.

Named Secrets

New primitive for env vars you don't want visible in API responses. anycloud secrets new|list|delete manages named bundles that are encrypted at rest, never returned by the API, and injected into containers at run time via --secret <name> (CLI) or secrets=[...] (Python SDK). Update a secret and resubmit to rotate — no need to edit every call site. See the Secrets guide.

Sensitivity warning on --env

When you pass a likely-sensitive key via --env (e.g. HF_TOKEN, API_KEY, anything matching token / secret / password / key), the CLI prints a one-line warning pointing at anycloud secrets new. The submit still proceeds; the warning is advisory.

Stable-version warning

The CLI checks its own version against the stable dist-tag on npm and prints a warning when you're behind. Keeps users from debugging issues that were already fixed upstream.

anycloud quota status now surfaces structural console URLs (AWS Service Quotas, Azure portal) for each case, so you can jump straight to the provider's UI.

Smarter quota request

anycloud quota request detects partial grants and skips re-requesting quota that's already been submitted or granted, so re-runs are idempotent.

Broader Azure Region Coverage

Azure catalog now pulls the full list of physical regions via the subscription-wide /locations endpoint instead of per-SKU locationInfo, which was silently gated on per-SKU enrollment. Measured on one subscription: 43 → 58 regions, ~33k → ~55k SKU rows.

--json on read-shaped commands

status, list, config list, and credentials list now accept --json. Chrome output routes to stderr; stdout is the pure JSON payload, so anycloud list --json | jq just works. credentials list --json emits only {name, cloudProvider} — secret fields never cross the boundary. Empty results return [] instead of the prior "No X found" strings.

Interactive deployment picker (TTY only)

Deployment ID is now optional on status, logs, exec, ssh, terminate, and resubmit: omit it in a TTY for an interactive picker (multi-select on terminate and resubmit). This walks back part of the "Non-Interactive CLI" change from 0.1.7 — non-TTY/CI still requires an explicit ID or ANYCLOUD_DEPLOYMENT_ID, so scripts and agents are unaffected.

Azure-safe deployment ID cap

Max deployment ID length lowered 40 → 28 to leave room for the -${index} suffix Azure appends to VM/NIC/public-IP names. Auto-generated IDs also tightened so they fit under the new cap.

0.1.7

Batch Submission in the Python SDK

Submit many jobs in parallel from a single call, with first-class partial-failure handling. The decorator path gets a map helper for fan-out workloads.

More Resilient Prebakes

Cached images survive upstream base-image rotation, and bake outcomes now show up in deployment status instead of disappearing into logs. CLI status also surfaces when a deploy hits a prebake cache.

Quota Requests From the CLI

Request and check cloud quota directly from the CLI, with AWS now supported alongside Azure. Deployments that exhaust retries on a quota error point you at the new commands.

Credentials Through the API

The CLI and MCP no longer touch plaintext credentials on disk. New MCP tools cover importing existing cloud credentials and generating least-privilege ones.

Non-Interactive CLI

Every CLI command is now flag- and env-driven — no prompts, no pickers, no wizards. Better for scripts, CI, and agents.

Steadier Region Rotation

Region or SKU mismatches from a cloud no longer fail a deployment outright; the optimizer rotates instead.

0.1.6

More Resilient Image Pulls

Faster retries on transient pull failures and gentler post-provisioning backoff.

Expanded MCP Server

Added list_gpus, list_gpu_counts, credentials_get, bucket_upload, bucket_download, and build tools.

Smarter Region Rotation

When every target is temporarily blocked, the optimizer rotates across regions instead of stalling on the cheapest one. Block TTLs are also preserved at their maximum, and InvalidTarget blocks are capped at 6 hours.

Strict CloudConfig Validation (Python SDK)

CloudConfig now rejects unknown keyword arguments instead of silently ignoring them.

0.1.5

Per-Bucket Storage Credentials

Input and output buckets can now use different cloud credentials and regions. The shared storageCredentials/storageRegion fields have been replaced with per-bucket equivalents (inputStorageCredentials, outputStorageCredentials, etc.).