Python SDK

Submit jobs and chain them together programmatically from Python.

Prerequisites

Before using the SDK, make sure you have:

Python >= 3.10
anycloud API running — start it with anycloud api start
Cloud credentials configured — create them with anycloud credentials new

See Getting Started for full setup instructions.

Install

Install the current public SDK release:

pip install anycloud-sdk

Bucket operations stream through the local anycloud api server using a credential you've already registered with anycloud credentials new — no extra installs needed.

Quick Start

If you have credentials configured via anycloud credentials new, the SDK picks them up automatically:

import anycloud

ac = anycloud.Client()

job = ac.submit("my-training:latest", gpu="h100:8")
job.wait()

print(job.logs())

Multiple credential sets

When you have more than one credential set, select one by name:

ac = anycloud.Client(credentials="aws-prod")

Explicit cloud config

You can still pass credentials directly — this takes precedence over the credentials file:

from anycloud.types import CloudConfig, AWSCredentials

cc = CloudConfig(
    cloudProvider="AWS",
    credentials=AWSCredentials(
        accessKeyId="AKIA...",
        secretAccessKey="...",
    ),
)

ac = anycloud.Client(cloud_config=cc)

The client connects to your local anycloud API at http://localhost:8080 by default.

Function Decorator

The @anycloud.function() decorator lets you run any Python function remotely. You provide a Docker image, and your repo is cloned at the current commit on the remote VM via git-based code sync. See Deployment Workflows for when to use the decorator vs submitting a prebuilt image.

import anycloud
from anycloud.types import CloudConfig

@anycloud.function(
    image="ghcr.io/acme/csp-prod:latest",
    gpu="t4:1",
    cloud_config=CloudConfig(
        credentials="azure2",
        vm_type="Standard_NC16as_T4_v3",
        spot=True,
        disk_size_gb=400,
        input_bucket="eval-data",
        output_bucket="eval-results",
    ),
)
def relax(structure_index: str, fmax: float = 0.01):
    import os, shutil
    from ase.io import read
    from ase.optimize import BFGS
    from fairchem.core import OCPCalculator

    checkpoint_dir = f"/mnt/checkpoint/relax/{structure_index}"
    output_dir = f"/mnt/output/relax/{structure_index}"
    os.makedirs(checkpoint_dir, exist_ok=True)
    os.makedirs(output_dir, exist_ok=True)

    atoms = read("/mnt/input/structures.cif", index=structure_index)
    atoms.calc = OCPCalculator(model_name="uma-s-1p1:omc")
    opt = BFGS(atoms, trajectory=f"{checkpoint_dir}/relax.traj")
    opt.run(fmax=fmax)

    shutil.copytree(checkpoint_dir, output_dir, dirs_exist_ok=True)

Submitting jobs

submit() sends the function to run remotely and returns a Job handle immediately. Pass function arguments as positional/keyword args, plus optional id and env overrides:

# Single job
job = relax.submit("0:10", fmax=0.005, id="relax-0to10")
job.wait()
print(job.logs())

# Fan out — submit 27 jobs with custom IDs
args = [f"{i}:{i+10}" for i in range(0, 270, 10)]
ids  = [f"relax-{i}to{i+10}" for i in range(0, 270, 10)]
jobs = relax.map(args, ids=ids)
jobs.wait()

Function.map issues all submissions in parallel.

Fanning out with `map()`

Function.map() submits the decorated function once per item in parallel:

jobs = relax.map(range(0, 270, 10))     # auto-generated IDs
jobs.wait()

# With custom IDs (parallel list)
args = [f"{i}:{i+10}" for i in range(0, 270, 10)]
ids  = [f"relax-{i}to{i+10}" for i in range(0, 270, 10)]
jobs = relax.map(args, ids=ids)

Each item in args becomes a single positional argument. For multi-argument calls, pass tuples and unpack inside the function, or use submit() in a loop.

Use Function.map when you're fanning out the same decorated function across varying arguments. For heterogeneous configs (different images, clouds, GPUs), use client.submit_many with a list of Submission objects.

Partial failures follow the same rules as submit_many: by default BatchSubmitError is raised with the successful jobs attached; pass return_exceptions=True to receive a mixed-entry JobGroup instead. See submit_many — Partial failures.

Per-submit environment variables

Pass extra env vars per-submit that merge on top of decorator-level env:

job = relax.submit("0:10", env={"WANDB_RUN_ID": "experiment-42"})

Restart-safe function steps

Use get_or_submit() when a workflow may be restarted and each step should reattach to the matching prior deployment instead of creating a duplicate:

job = relax.get_or_submit(
    "relax-0to10",
    "0:10",
    _workflow_id="experiment-42",
    fmax=0.005,
)
job.wait()

Function.get_or_submit(step_id, ..., _workflow_id=None) generates a deployment ID from _workflow_id, step_id, function arguments, env, secrets, image, command, Docker options, and cloud config. The same workflow/step/spec reattaches on restart; changing the function args or submission config creates a different deterministic deployment ID.

Use _workflow_id for the workflow namespace so wrapped functions can still accept workflow_id or step_id as ordinary function arguments. Pass the workflow step name as the first argument; use _step_id= only when you need a keyword form for the wrapper's step name.

Requirements:

Your code must be in a GitHub repo, committed and pushed. Private repos require anycloud login or a GITHUB_TOKEN with repo read access.
The Docker image must have git installed

Decorator Parameters

Parameter	Type	Description
`image`	`str`	Docker image reference (required). Must have `git` installed.
`gpu`	`str`	GPU type shorthand (e.g. `"h100:8"`).
`env`	`dict`	Extra environment variables passed to the container.
`secrets`	`list[str]`	Names of saved secrets to inject as env vars. See Secrets.
`cloud_config`	`CloudConfig`	Cloud config for this function's submissions.
`docker_options`	`dict`	Docker runtime options (`shmSize`, `gpus`, `ipc`, etc.).
`target_path`	`str`	Absolute path the repo is cloned into on the remote container. Default `"/app"`. Set this when your image already populates `/app` with build artifacts.

`submit()` Parameters

Parameter	Type	Description
`*args`		Positional arguments for the wrapped function.
`id`	`str`	Custom deployment ID (auto-generated if omitted). Maps to `--id`.
`env`	`dict`	Per-submit env vars, merged on top of decorator-level `env`.
`secrets`	`list[str]`	Per-submit secret names, merged on top of decorator-level `secrets`.
`**kwargs`		Keyword arguments for the wrapped function.

`get_or_submit()` Parameters

Parameter	Type	Description
`step_id`	`str`	Stable step name within the workflow. Required as the first argument.
`*args`		Positional arguments for the wrapped function.
`_workflow_id`	`str`	Optional workflow/run namespace.
`env`	`dict`	Per-submit env vars, merged on top of decorator-level `env`.
`secrets`	`list[str]`	Per-submit secret names, merged on top of decorator-level `secrets`.
`**kwargs`		Keyword arguments for the wrapped function; included in the deterministic ID.

Arguments must be JSON-serializable — only str, int, float, bool, None, list, and dict are supported. For complex or large data, use input_bucket instead.

No return values — the decorated function runs remotely and its return value is discarded. Write results to /mnt/output via output_bucket.

Serve Decorator

@anycloud.serve uses the same git-clone code sync as @anycloud.function, but the decorated function is a long-running daemon entrypoint. It should bind PORT and block. AnyCloud sets PORT=8088 by default; override it with env={"PORT": "<port>"}. Private repos work when authentication is available through anycloud login or a GITHUB_TOKEN with repo read access.

import os
import anycloud

@anycloud.serve(
    image="ghcr.io/acme/inference:latest",
    gpu="L40S:1",
    env={"NAMESPACE": "model-a"},
    secrets=["inference-node"],
    target_path="/opt/app",
)
def node():
    import uvicorn
    from myapp import app

    uvicorn.run(app, host="0.0.0.0", port=int(os.environ["PORT"]))

server = node.start(id="model-a-001", env={"REGION": "us"})
server.wait_running(timeout=600)
print(server.url)  # https://model-a-001.anycloud.sh

Serve Decorator Parameters

Parameter	Type	Description
`image`	`str`	Docker image reference (required). Must have `git` installed.
`gpu`	`str`	GPU type shorthand (e.g. `"L40S:1"`).
`env`	`dict`	Environment variables passed to the server container.
`secrets`	`list[str]`	Names of saved secrets to inject as env vars.
`cloud_config`	`CloudConfig`	Cloud config for server launches.
`docker_options`	`dict`	Docker runtime options (`shmSize`, `gpus`, `ipc`, etc.).
`target_path`	`str`	Absolute path the repo is cloned into on the remote container. Default `"/app"`.

`start()` Parameters

Parameter	Type	Description
`id`	`str`	Custom deployment ID. The server URL is derived from this ID.
`env`	`dict`	Per-server env vars, merged on top of decorator-level `env`.
`secrets`	`list[str]`	Per-server secret names, merged on top of decorator-level `secrets`.

Testing Serve Decorators

The live Python SDK e2e suite includes a dedicated serve group for the decorator git-clone path, Server handle dispatch, and public URL response:

bash test/integration/e2e.sh sdk-python-serve

`CloudConfig` Parameters

CloudConfig accepts credentials as either a string name (e.g. "azure2", resolved lazily from saved credentials) or a credential object. See CloudConfig Internals for GPU vs VM type, credential resolution, and config variants.

Parameter	Type	Description
`credentials`	`str \| CloudCredentials`	Credential name or object.
`cloud_provider`	`CloudType`	Cloud provider (inferred from credentials if omitted).
`vm_type`	`str`	VM type (e.g. `"Standard_NC16as_T4_v3"`).
`spot`	`bool`	Use spot/preemptible instances.
`region`	`str`	Cloud region.
`availability_zone`	`str`	Availability zone.
`disk_size_gb`	`int`	Root disk capacity in GB.
`disk_tier`	`DiskTier`	AWS root disk performance tier: `medium`, `high`, or `ultra`.
`input_bucket`	`str`	Input bucket name (mounted at `/mnt/input`).
`output_bucket`	`str`	Output bucket name (mounted at `/mnt/output`).
`input_storage_credentials`	`CloudCredentials`	Credentials for input bucket (cross-cloud).
`input_storage_region`	`str`	Region for input bucket storage.
`output_storage_credentials`	`CloudCredentials`	Credentials for output bucket (cross-cloud).
`output_storage_region`	`str`	Region for output bucket storage.
`checkpoint_storage_credentials`	`CloudCredentials`	Credentials for the checkpoint bucket (cross-cloud).
`checkpoint_storage_region`	`str`	Region for checkpoint bucket storage.

Runtime Environment

Inside the remote container, the following environment variable is available:

Variable	Description
`DEPLOYMENT_ID`	Unique deployment ID.
`PORT`	Server listen port for `serve` deployments. Defaults to `8088`; override with `env={"PORT": "<port>"}`.

For mount paths (/mnt/input, /mnt/output, /mnt/checkpoint), see Bucket Sync.

Buckets

Use Bucket objects to manage cloud buckets and wire data between jobs. See Bucket Sync for bucket types and usage patterns. See Deploying Jobs for data chaining and fan-in patterns.

Bucket Handles

ac.bucket() returns a lazy handle — no cloud calls are made until you use it:

data = ac.bucket("training-data")
results = ac.bucket("results")

Upload and Download

data = ac.bucket("training-data")

# Upload a local file to a key in the bucket (the bucket must already exist —
# create it first with `anycloud bucket create`)
data.upload("~/datasets/shard-0.bin", remote_path="shard-0.bin")

# Upload to a specific key
data.upload("~/labels.csv", remote_path="metadata/labels.csv")

# Download a single object to a local file
data.download("~/shard-0.bin", remote_path="shard-0.bin")

Handles

Client.submit() returns a Job. Client.serve() and @anycloud.serve(...).start() return a Server. Both share state(), status(), logs(), exec(), and terminate().

Job Methods

Client.submit() returns a Job — a future-like handle to a deployment.

Method	Description
`job.wait(timeout=None, poll_interval=2.0)`	Block until `Completed`. Returns `self` on success, raises `DeploymentFailedError` on failure.
`job.state()`	Fetch current `DeploymentState` from the API.
`job.status()`	Full `StatusResponse` (events, VM health, SSH key).
`job.logs()`	Container stdout/stderr.
`job.exec(command, vm=False)`	Run a command in the container (or on the VM). Returns stdout.
`job.terminate()`	Terminate the deployment.
`job.resubmit()`	Resubmit a terminal deployment. Returns a new `Job`.

Properties:

Property	Description
`job.id`	Deployment ID.

Server Methods

Client.serve() returns a Server — a handle to one long-running serve deployment.

Method	Description
`server.wait_running(timeout=None, poll_interval=2.0)`	Block until `Running`. Raises `DeploymentFailedError` on terminal failure.
`server.state()`	Fetch current `DeploymentState` from the API.
`server.status()`	Full `StatusResponse` (events, VM health, SSH key).
`server.logs()`	Container stdout/stderr.
`server.exec(command, vm=False)`	Run a command in the container (or on the VM). Returns stdout.
`server.terminate()`	Terminate the deployment.

Properties:

Property	Description
`server.id`	Deployment ID.
`server.url`	Stable public URL: `https://<id>.anycloud.sh`.

JobGroup

JobGroup is returned by client.submit_many(...) and fn.map(...). It cannot be constructed directly.

Method / Property	Description
`group.wait(timeout=None, poll_interval=2.0)`	Block until all jobs finish. Raises `JobGroupError` if any failed.
`group.terminate()`	Terminate all jobs in parallel.
`group.ids`	List of deployment IDs (successful submissions only).
`group.errors`	`[(input_index, exception), ...]` — only populated with `return_exceptions=True`.
`len(group)`, `group[i]`, `for job in group`	Sequence protocol. With `return_exceptions=True`, entries may be `Job` or `Exception`.

Client Methods

Method	Description
`ac.submit(image, ...)`	Submit a job. See parameters below.
`ac.get_or_submit(step_id, image, ...)`	Submit a restart-safe workflow step or reattach to the matching deployment.
`ac.submit_many(submissions, ...)`	Submit N jobs in parallel from a list of `Submission`s. Returns a `JobGroup`.
`ac.serve(image, ...)`	Start a long-running server deployment. Returns a `Server`.
`ac.bucket(name)`	Get a lazy `Bucket` handle. No cloud calls until use.
`ac.list(limit=20)`	List recent deployments.
`ac.get(deployment_id)`	Get a `Job` or `Server` handle for an existing deployment.
`ac.close()`	Close the HTTP connection.

`Client()` Parameters

Parameter	Type	Description
`api_url`	`str`	API server URL. Falls back to `API_URL`, then `~/.anycloud/api-url`, then `http://localhost:8080`.
`cloud_config`	`CloudConfig`	Default cloud config for all submits. Takes precedence over credentials file.
`credentials`	`str`	Name of a credential set saved in the local anycloud API database. Auto-selects if only one exists.

`submit()` Parameters

Parameter	Type	Description
`image`	`str`	Docker image reference (required).
`cloud_config`	`CloudConfig`	Cloud config (provider, credentials, region, etc.). Falls back to client default.
`gpu`	`str`	GPU type (e.g. `"h100:8"`, `"a100:4"`).
`env`	`dict`	Environment variables for the container.
`secrets`	`list[str]`	Names of saved secrets to inject as env vars. See Secrets.
`docker_options`	`dict`	Docker runtime options (`shmSize`, `gpus`, `ipc`, etc.).
`command`	`list[str]`	Override container CMD.
`persist`	`bool`	Keep VM alive after job completion.
`deployment_id`	`str`	Custom deployment ID.
`input`	`Bucket`	Bucket to mount at `/mnt/input` (read-only).
`output`	`Bucket`	Bucket to mount at `/mnt/output` (write, synced periodically).

submit() also accepts a Submission object directly when you've built one programmatically:

from anycloud import Submission

s = Submission(image="train:latest", gpu="h100:8")
job = ac.submit(s)

`get_or_submit()` Parameters

get_or_submit() accepts the same job options as submit(), except you pass a required step_id first and do not pass deployment_id:

job = ac.get_or_submit(
    "train",
    "train:latest",
    workflow_id="daily-retrain-2026-06-22",
    gpu="h100:8",
    env={"LR": "0.001"},
)
job.wait()

The SDK generates a deterministic deployment ID that satisfies the server ID rules (lowercase letters, digits, hyphens, max 28 chars). The ID is based on workflow_id, step_id, and a normalized submission spec including image, env, secrets, command, Docker options, and cloud config. If the API returns ConflictError for that generated ID, the SDK returns ac.get(generated_id).

Use submit() when every call should create a new deployment. Use get_or_submit() when restarting the same workflow should reattach to completed, failed, running, or queued matching steps.

`submit_many()` Parameters

Parameter	Type	Description
`submissions`	`list[Submission]`	List of submissions to submit in parallel.
`max_workers`	`int`	Thread pool size (defaults to len(list)).
`return_exceptions`	`bool`	If `True`, return a mixed `Job \| Exception` group instead of raising. Default `False`.

Submission accepts the same fields as submit() (image, cloud_config, gpu, env, command, input, output, etc.).

from anycloud import Submission

jobs = ac.submit_many([
    Submission(image="features:latest", output=shared),
    Submission(image="labels:latest",   output=shared),
])
jobs.wait()

Use Submission.replace(**kwargs) to vary one field off a shared base — useful when most of the config is constant across a fan-out:

base = Submission(image="worker:latest", gpu="h100:8", cloud_config=cc)
jobs = ac.submit_many([
    base.replace(env={"SHARD": str(i)}) for i in range(8)
])

For fanning out a decorated function across varying arguments, use fn.map() instead.

Partial failures

By default, if any submission fails, submit_many raises BatchSubmitError. Successful submissions are attached to the error so you can still wait on or terminate them:

from anycloud import BatchSubmitError

try:
    jobs = ac.submit_many(subs)
except BatchSubmitError as e:
    print(f"{len(e.errors)} failed, {len(e.submitted)} running")
    for i, exc in e.errors:
        print(f"  submission {i}: {exc}")
    # Optionally keep the successful ones running, or clean up:
    for job in e.submitted:
        job.terminate()

Pass return_exceptions=True to never raise. You get a JobGroup whose entries are Job or Exception in input order; group.wait(), group.terminate(), and group.ids silently skip the exception entries, and group.errors lists them:

jobs = ac.submit_many(subs, return_exceptions=True)
jobs.wait()                           # only awaits the successful entries
for idx, exc in jobs.errors:
    print(f"submission {idx} failed: {exc}")

Function.map() accepts the same return_exceptions flag and raises the same BatchSubmitError on partial failure.

Catalog browsing (regions, VM types, pricing, GPUs) is available via the anycloud CLI: anycloud regions, anycloud vm-types, anycloud pricing, anycloud gpus.

Credentials Methods

Manage cloud provider credentials stored on the API server.

Method	Description
`ac.list_credentials()`	List saved credentials (secrets redacted).
`ac.get_credential(name)`	Get a single credential by name (secrets redacted).
`ac.save_credential(name, credential)`	Save a cloud credential set. `credential` must include `cloudProvider` and provider-specific fields.
`ac.delete_credential(name)`	Delete a saved credential set.

creds = ac.list_credentials()
ac.save_credential("aws-prod", {
    "cloudProvider": "AWS",
    "accessKeyId": "AKIA...",
    "secretAccessKey": "...",
})
ac.delete_credential("aws-prod")

Secrets Methods

Manage named secrets. See the Secrets guide.

Method	Description
`ac.list_secrets()`	List saved secrets (names and timestamps only — values are never returned).
`ac.create_secret(name, values)`	Create or update a secret. `values` is a `dict[str, str]` of env-var names to values.
`ac.delete_secret(name, *, force=False)`	Delete a secret. Raises `ConflictError` when non-terminal deployments reference it; pass `force=True` to override.

ac.create_secret("hf", {"HF_TOKEN": "hf_xxxx"})
job = ac.submit("train:latest", gpu="h100:8", secrets=["hf"])
ac.delete_secret("hf", force=True)

Configuration

Environment Variables

Variable	Description	Default
`GITHUB_TOKEN`	GitHub token for authentication (CI fallback).	—
`API_URL`	API server URL override. If unset, `~/.anycloud/api-url` is used when present.	`http://localhost:8080`

Context Manager

The client can be used as a context manager to automatically close the HTTP connection:

with anycloud.Client() as ac:
    job = ac.submit("task:latest")
    job.wait()

Error Handling

Exception	Description
`AnyCloudError`	Base exception for all SDK errors.
`APIError`	HTTP error from the conductor API.
`ConflictError`	Deployment ID already exists (409).
`NotFoundError`	Deployment not found (404).
`DeploymentFailedError`	Deployment reached a terminal failure state. `.logs` has last output.
`JobGroupError`	One or more jobs in a group failed during `wait()`. `.errors` has all failures.
`BatchSubmitError`	One or more submissions in `submit_many` / `fn.map` failed. `.submitted` and `.errors` have the split.
`TimeoutError`	Operation timed out.

from anycloud import AnyCloudError, DeploymentFailedError

try:
    job.wait(timeout=3600)
except DeploymentFailedError as e:
    print(f"Deployment {e.deployment_id} failed with state: {e.state}")
    if e.logs:
        print(e.logs)
except AnyCloudError as e:
    print(f"SDK error: {e}")

Prerequisites​

Install​

Quick Start​

Multiple credential sets​

Explicit cloud config​

Function Decorator​

Submitting jobs​

Fanning out with map()​

Per-submit environment variables​

Restart-safe function steps​

Decorator Parameters​

submit() Parameters​

get_or_submit() Parameters​

Serve Decorator​

Serve Decorator Parameters​

start() Parameters​

Testing Serve Decorators​

CloudConfig Parameters​

Runtime Environment​

Buckets​

Bucket Handles​

Upload and Download​

Handles​

Job Methods​

Server Methods​

JobGroup​

Client Methods​

Client() Parameters​

submit() Parameters​

get_or_submit() Parameters​

submit_many() Parameters​

Partial failures​

Credentials Methods​

Secrets Methods​

Configuration​

Environment Variables​

Context Manager​

Error Handling​

Prerequisites

Install

Quick Start

Multiple credential sets

Explicit cloud config

Function Decorator

Submitting jobs

Fanning out with `map()`

Per-submit environment variables

Restart-safe function steps

Decorator Parameters

`submit()` Parameters

`get_or_submit()` Parameters

Serve Decorator

Serve Decorator Parameters

`start()` Parameters

Testing Serve Decorators

`CloudConfig` Parameters

Runtime Environment

Buckets

Bucket Handles

Upload and Download

Handles

Job Methods

Server Methods

JobGroup

Client Methods

`Client()` Parameters

`submit()` Parameters

`get_or_submit()` Parameters

`submit_many()` Parameters

Partial failures

Credentials Methods

Secrets Methods

Configuration

Environment Variables

Context Manager

Error Handling