-
Notifications
You must be signed in to change notification settings - Fork 191
Comparing changes
Open a pull request
base repository: databricks/cli
base: main
head repository: databricks/cli
compare: air-cli
- 8 commits
- 69 files changed
- 2 contributors
Commits on Jun 17, 2026
-
AIR CLI Integration: Scaffold experimental AIR CLI command package (#…
…5564) ## Changes - New `cmd/experimental/air/ ` package containing an air parent command plus 7 stub subcommands: run, status, list, logs, cancel, register-image - currently, all subcommands return an air <cmd> is not implemented yet error with representative flags mapped from the Python CLI. Registered under the hidden experimental group. - tools/list_embeds.py: text=True was changed to universal_newlines=True so the acceptance harness runs on Python 3.6. General tooling fix. ## Why The AI runtime CLI ships today as a separately installed Python wheel with its own auth, output, and packaging. Folding it into the main Go CLI gives users one databricks install with consistent profiles, authentication, and -o json output, and removes a parallel toolchain to maintain. Landing the package scaffold first lets the individual commands be ported in small, reviewable PRs (status is next) instead of one large drop. Every stub is wired and navigable, so the command tree and registration are reviewable now without functional code. ## Tests - Unit (cmd/experimental/air/): New() registers all six subcommands; each stub returns the not-implemented error. - Acceptance (acceptance/experimental/air/unimplemented/): runs every stub end-to-end and asserts the message + non-zero exit. test with: `go test ./cmd/experimental/air/...` `go test ./acceptance -run 'TestAccept/experimental/air'`
Configuration menu - View commit details
-
Copy full SHA for 997f6bb - Browse repository at this point
Copy the full SHA 997f6bbView commit details
Commits on Jun 18, 2026
-
AIR CLI Integration: Implement the
air getcommand (#5600)## Changes Implements `databricks experimental ai get RUN_ID`, the Go port of the Python `air get` command. It fetches the run via `Jobs.GetRun` and renders: - Core fields: run ID, status, submitted time, duration, retries, experiment, accelerators, creator (`User`), and the run's dashboard URL. - An MLflow deep-link, built from `jobs/runs/get-output` (the `gen_ai_compute_output` field is not modeled by the typed SDK, so it's fetched via a direct REST call). - For foreach/sweep runs, an iteration summary (counts + per iteration table) instead of the single-run view. - The run's training-config YAML, downloaded from the workspace and printed before the status (text mode only). ## Why `get` is the first real command integrated from the air cli and it sets the conventions the rest of the CLI will follow. The `{v, ts, data}` envelope mirrors the Python CLI so existing machine consumers keep working. The implementation is a faithful port of `handle_status` + the `cli_display` helpers, verified field-by-field against the Python source: - The text view shows the foreach branch (`_display_foreach_sweep_status`) and the training-config panel (`_fetch_and_display_yaml_config`); JSON output omits both, exactly matching `air get <run> --json`. - MLflow IDs live under an unmodeled `gen_ai_compute_output` field (direct REST call), and the MLflow link / YAML fetch are best-effort (logic matches python cli) ## Tests - Unit tests cover every formatting/extraction helper, `buildGetData`, and all template branches (single-run minimal/all-fields, sweep, sweep-with-no-tasks). - Mock-backed unit tests (mirroring the Python `unittest.mock` suite) cover `buildSweepInfo`, `printConfigYAML`, `mlflowURL` (over `httptest`, since it bypasses the typed SDK), and the `RunE` invalid-id / not-found branches. - An acceptance test (`acceptance/experimental/air/get`) runs the command end-to-end against a stubbed Jobs API: text output, `-o json`, and an invalid run ID. Manual verification outputs: Successful run: <img width="1529" height="74" alt="Screenshot 2026-06-17 at 1 17 30 PM" src="https://github.com/user-attachments/assets/ee10167e-52b2-4998-98af-4e9bb169b010" /> <img width="1529" height="215" alt="Screenshot 2026-06-17 at 1 16 48 PM" src="https://github.com/user-attachments/assets/888fd89e-2e5b-450e-8d45-a87afef3b005" /> <img width="1517" height="362" alt="Screenshot 2026-06-17 at 11 56 00 AM" src="https://github.com/user-attachments/assets/008c90a4-f753-4646-b995-a9cbc40176fe" /> <img width="1529" height="295" alt="Screenshot 2026-06-17 at 2 05 21 PM" src="https://github.com/user-attachments/assets/37da6e6c-efe9-494e-96df-dbcf392f7a17" /> Failed run: <img width="1529" height="212" alt="Screenshot 2026-06-17 at 1 11 31 PM" src="https://github.com/user-attachments/assets/0f15bb4d-8c89-42d4-808e-b432a7f317e4" /> <img width="1529" height="59" alt="Screenshot 2026-06-17 at 1 13 22 PM" src="https://github.com/user-attachments/assets/d3fa5390-9e3b-4b42-9a71-e1eb1a7d4975" /> <img width="1529" height="403" alt="Screenshot 2026-06-17 at 1 15 52 PM" src="https://github.com/user-attachments/assets/b8c3eb62-1ef6-4633-9104-3e99d34340d0" /> <img width="1529" height="338" alt="Screenshot 2026-06-17 at 2 04 48 PM" src="https://github.com/user-attachments/assets/1a34ce4f-025b-4139-8f0a-0f40e16bba6c" />Configuration menu - View commit details
-
Copy full SHA for b952417 - Browse repository at this point
Copy the full SHA b952417View commit details -
AIR CLI Integration:
air runCommand Pt. 1 - Add GPU accelerator ty……pe and compute config model (#5602) ## Changes Adds `experimental/air/cmd/compute.go` , which is the `gpuType` model and `compute` which is the block validation that the `air run` configuration layer depends on. Specifically: - the training service accelerator types were added (`GPU_1xA10`, `GPU_8xH100`, `GPU_1xH100`) - `parseGPUType` resolves a YAML accelerator type string - `gpusPerNode` is the per node partition count based on the type name - `computeConfig` and `validate()` are the port of the python `ComputeConfig` validators ## Why This is the first, leaf-most piece of the `air run` port for the AIR CLI and the root of the config validation layer dependencies. This piece for compute does not depend on anything else so it lands first as a small and fully unit-tested unit. Note that we also use exact case sensitive parsing since a potential typo in the user's YAML could misroute the run. Additionally, we only support `GPU_*` training service types (legacy MAPI types (eg. `h100_80gb`) are no longer supported and intentionally deprecated in this port. However, they still have their own display map for historical runs to be able to be displayed (but no new runs can use the MAPI path). Rendering them in get is unaffected since format.go keeps its own display map for historical runs. ## Tests Table-driven unit tests in compute_test.go: parseGPUType for valid types and rejected inputs (wrong casing, legacy types, unknown, empty); gpusPerNode counts plus its invalid-type error; and computeConfig.validate across valid configs and every failure mode (unknown/legacy type, non-positive count, non-multiple count, dual-pool conflict). go build, go test, and golangci-lint are clean.
Configuration menu - View commit details
-
Copy full SHA for f1601b2 - Browse repository at this point
Copy the full SHA f1601b2View commit details
Commits on Jun 23, 2026
-
AIR CLI Integration: render
air get runas styled boxes (#5654)## What Replaces the plain-text view of `air get run <id>` with a one-shot, styled terminal renderer built on **lipgloss** (layout/styling) and **termenv** (hyperlinks + color-profile detection). It builds the full string and writes it once — no streaming, spinner, or redraw. The view is two boxes: - **Configuration** — the resolved run config YAML (inline `yaml_parameters`, the downloaded `yaml_parameters_file_path`, or a synthesized fallback), colorized line by line. - **Metadata** — Run ID, Status, Submitted, Retries, Max Retries, Duration, Experiment, MLflow Run, User, Accelerators, Environment. Run ID and MLflow Run are OSC 8 hyperlinks. ## Look & feel - Boxes share a light-purple border/title, warm Oat neutrals, and a restrained accent palette (blue for keys/links; green/amber/red reserved for the status dot). - Honors `--no-color` / `NO_COLOR` / non-TTY via `termenv.Ascii`: no escape codes, and links degrade to the bare label (the URLs remain available in `-o json` as `dashboard_url` / `mlflow_url`). ## Scope - Sweep (foreach) runs and JSON output are unchanged. - `termenv` becomes a direct dependency (annotated `// MIT` in `go.mod`, added to `NOTICE`). ## Testing - Unit tests in `render_test.go` / `mlflow_test.go` cover the box, field list, link fallback, config sourcing, and the MLflow run-name fetch. - Acceptance output regenerated (`acceptance/experimental/air/get`). - `go build ./...`, `./task lint-q` (0 issues), and the air + acceptance suites pass. This pull request and its description were written by Isaac. --------- Co-authored-by: Maggie Wang <141875985+maggiewang-db@users.noreply.github.com>
Configuration menu - View commit details
-
Copy full SHA for e118e67 - Browse repository at this point
Copy the full SHA e118e67View commit details -
AIR CLI Integration:
air listFunctionality & UI (Interfacing with ……Training Service) (#5684) ## Changes Add `air list` as a browsable view of the caller's recent AIR training runs. - Data source: the `AiWorkflowService.ListTrainingWorkflows` RPC (`GET /api/2.0/ai-training/workflows`), called directly via `client.Do` since the endpoint is `PUBLIC_UNDOCUMENTED` and not modeled by the SDK. The server does the AIR filtering, creator scoping, MLflow-ID resolution, and pagination, so no Jobs-API logic lives in the CLI. - Interactive table: in a terminal `air list` renders an inline, navigable table (Bubble Tea + Lip Gloss + termenv): `↑/↓` move a row, `←/→` page (20 rows/page), `Enter` opens the run's MLflow page, `q` quits. Status is colored by state and the MLflow column is a short clickable hyperlink. - Non-interactive: piped output, an explicit `--limit`, and empty results print the table once; `-o json` emits the air `{v,ts,data}` envelope unchanged. - Flags: `--limit` (default: all), `--active`, `--all-users`, and client-side `--filter` keys (`experiment`, `accelerator_type`, `num_accelerators`). Gateway timeouts (e.g. HTTP 504 on `--all-users`) return an actionable message. - Adds `cmdio.IsPagerSupported`; promotes `termenv` to a direct dependency ## Why The `ai-training` service now owns the AIR-specific run logic server-side, so `air list` should call its RPC rather than reimplementing run discovery against the Jobs API. The interactive table gives a browsable run list on par with the Python `air` CLI and `databricks jobs list-runs`. ## Tests - Unit: RPC transport, `TrainingWorkflow`→row mapping, `--filter` matching, status/accelerator/timestamp helpers, and the TUI model (navigation, paging, 20-row page cap, window scroll, quit, static render). - Acceptance: `acceptance/experimental/air/list` (text + JSON) plus `help` updates; `unimplemented` no longer covers `air list` Manual verification output: <img width="1444" height="596" alt="Screenshot 2026-06-22 at 11 52 41 AM" src="https://github.com/user-attachments/assets/2e4a5917-8562-44ed-bb1d-a1cb1398731c" />
Configuration menu - View commit details
-
Copy full SHA for bd3f934 - Browse repository at this point
Copy the full SHA bd3f934View commit details
Commits on Jun 24, 2026
-
AIR CLI Integration: collapse
air get runback to `air get JOB_RUN_……ID` (#5685) ## Why We decided to cut the `get run` sub-resource. The run-status command is now just `air get <id>` — flat, with no `run` subcommand. ## Changes - Removed the `get` parent group and its `run` subcommand; `newGetCommand` is the run-status command itself (`Use: "get JOB_RUN_ID"`, `ExactArgs(1)`). - No change to output behavior — the styled config box, `JOB_RUN_ID` naming, `Job Link` header, status table, and sweep view are all unchanged. - Regenerated the `experimental/air/get` and `experimental/air/help` acceptance outputs; updated doc comments and tests that referenced `air get run`. ## Tests - Added `TestGetCommandShape`: asserts `Use == "get JOB_RUN_ID"`, no registered subcommands, and exactly one arg required. - Updated the existing `get` unit tests (invalid id, not-found text/JSON, templates, `buildGetData`) to the new entry point. - `experimental/air/{get,help}` acceptance regenerated; full air unit + acceptance suites pass. This pull request and its description were written by Isaac.
Configuration menu - View commit details
-
Copy full SHA for ca7c0f3 - Browse repository at this point
Copy the full SHA ca7c0f3View commit details
Commits on Jun 30, 2026
-
AIR CLI Integration: Adding support for air run configuration (#5657)
## Changes Ports the air run YAML config schema and its structural validation from the Python CLI (cli/sdk/config.py) to Go, under experimental/air/cmd/. - Schema (runconfig.go): the top-level runConfig plus the nested environment (with docker_image), code_source/snapshot/git, and permission blocks. Reuses the compute model from the parent branch. Includes custom YAML unmarshalers for the three polymorphic fields that don't map to a single Go type: environment.dependencies (string path or inline list), environment.version (string or int), and git.remote (bool or remote-name string). - Loader (runconfig_load.go): loadRunConfig decodes a YAML file with KnownFields(true) — mirroring pydantic's extra="forbid" so unknown keys are rejected — then runs the validation pass. - Validation: every structural rule from the Python schema — required fields, the experiment_name/mlflow_run_name task-key regex and length caps, secret-ref scope/key format, the environment docker-image/dependencies/version exclusivity rules, git branch-xor-commit and remote-requires-branch rules, code_source snapshot requirements, and include_paths relative/no-traversal checks. Two deliberate divergences from the Python schema, both following from the training-service-only port: - The compute.node_pool_id / compute.pool_name fields were already dropped on the parent branch. - The top-level priority field is dropped here: it's a node-pool queue-ordering knob (it requires a pool in Python) with no meaning for serverless workloads. ## Why "Structural" validation (types, required fields, format/cross-field rules) needs no workspace access, so it's a self-contained, fully unit-testable unit that's worth landing on its own ahead of the launch logic. Splitting it out keeps the upcoming handle_run PR focused on orchestration rather than mixing in ~900 lines of schema. The extra="forbid" / KnownFields behavior is load-bearing: it's what turns a typo'd or stale config key into an actionable error instead of a silently-ignored field, so it's preserved faithfully. This is stacked on air-integration-m2-1 (the compute model). ## Tests New unit tests in runconfig_test.go (62 subtests, table-driven), covering: - Loading a minimal config and a full-featured config (all blocks populated). - Each polymorphic union decoding both of its forms (dependencies string vs list, git.remote bool vs string, default-unset). - Unknown-field rejection at top level and nested — including explicit cases asserting the dropped priority field and the not-yet-ported _bases_ key surface as errors. - Every validation rule's failure mode, plus file-level errors (missing file, empty file). go test ./experimental/air/... passes; ./task lint-q reports 0 issues.
Configuration menu - View commit details
-
Copy full SHA for 60adcaa - Browse repository at this point
Copy the full SHA 60adcaaView commit details -
AIR CLI Integration:
air runend to end command (#5710)## Changes Implements the `air run` happy path on top of the config schema (#5657), submitting a one-time training run through the Jobs API. Five commits, one per phase: 1. run config launch accessors: flatten the validated config into launch values (timeout seconds, retry default, requirements file-vs-inline, runtime version). 2. wire run command (load, validate, dry-run): air run -f <config> loads + structurally validates the YAML; `--dry-run` validates offline (no workspace/auth) and returns; `--override/--watch` are rejected for now with clear errors (ported in future PR). 3. pre-submit resolution: resolve current user / workspace home / a unique cli_launch dir, and ensure a custom `experiment_directory` exists. 4. upload launch artifacts: write training_config.yaml (1 MB cap), command.sh, requirements.yaml (file or synthesized from inline deps), `env_vars.json` / `secret_env_vars.json`, and hyperparameters.yaml into the launch dir via a workspace filer. 5. assemble + submit: build the native `ai_runtime_task` payload and `POST /api/2.2/jobs/runs/submit` directly, then print the run id + dashboard URL (or a JSON envelope). Submission uses the **native `ai_runtime_task`** task (BYOT task type) and it talks only to the Jobs API (which internally routes to training service endpoint) and has no genai-mapi forwarding (the MAPI path is deprecated). It isn't modeled by the typed SDK in go, so the payload is a custom struct posted to the raw endpoint. The proto is lean: env vars and secrets ship as co-located `env_vars.json` / `secret_env_vars.json` files rather than inline, and `requirements.yaml` / `hyperparameters.yaml` are derived server-side from the command directory. **Deferred, with explicit "not yet supported" errors (no silent drops):** `code_source` snapshot packaging, `--watch` log streaming, and `usage_policy_name`. `environment.docker_image` is accepted by the schema as scaffolding but not conveyed in the payload (the native path has no docker field). `node_pool_id` / `pool_name` / `priority` remain dropped (new AIR CLI does not support pool placement). ## Why `air run` is the core of the migration for AIR CLI. Splitting it into per-phase commits keeps each reviewable in isolation, and stacking on the schema PR keeps that PR focused. Regarding some specific decisions: - We maintain the native ai_runtime_task (and not the genai_compute_task interfacing with mapi) as a hand built struct posted to the raw endpoint. This is so that we can interface with jobs directly (and jobs.SubmitTask only knows gen_ai_compute_task and this typed struct also omits the env-vars/secrets/requirements fields that are needed for the run) and make sure we also stay off the deprecated genai-mapi forwarding path. - `--dry-run` is decoupled from auth. It validates the config locally and returns before any workspace call, so config validation works fully offline (matching the Python CLI). Only actual submission requires an authenticated workspace client. ## Tests - Unit tests for every phase: launch accessors, pre-submit resolution (incl. ensureExperimentDirectory create/exists/not-a-directory), artifact assembly + upload, payload assembly, and submitWorkload end-to-end against a fake workspace. - New acceptance/experimental/air/run test covering --dry-run (text + JSON), the --override/--watch guards, an invalid config, and missing --file. - Updated the unimplemented acceptance test (removed run, now implemented). `go test ./experimental/air/...`, `go test ./acceptance -run TestAccept/experimental/air`, and `./task lint-q` all pass. **Manual verification tests (all pass):** - Dry run (offline, no auth) > - command only > - full run config > - json output - actual run submission > - throws error when profile is not set > - submission loop: submitted, can see the run in `air list` and `air get` and mlflow environment was created > - same run id gets ouputted when run submitted with the SAME idempotency key > - new run gets created when run submitted with SAME config but DIFFERENT idempotency key - `--watch` and `--override` return an informative error message (since they are not supported yet, but are valid flags) - usage_policy_name set in config throws error: usage_policy_name is not yet supported - code_source set in config throws error: code_source is not yet supported - missing --file throws informative error: required flag(s) "file" not set - invalid config (e.g. experiment_name: bad.name, or num_accelerators not a multiple of the per-node count) throws field-specific validation error **How to test locally for manual verification:** Checkout & build: ```bash git fetch origin git checkout air-integration-m2-3 # this PR (stacked on air-integration-m2-2) ./task build ``` Sample configs: ```bash cat > /tmp/min.yaml <<'YAML' experiment_name: air-cuj command: python train.py compute: {accelerator_type: GPU_1xH100, num_accelerators: 1} YAML ``` ```bash cat > /tmp/full.yaml <<'YAML' experiment_name: full-run command: | pip install -r requirements.txt python train.py compute: {accelerator_type: GPU_8xH100, num_accelerators: 16} environment: {dependencies: [torch==2.3.0], version: 5} env_variables: {WANDB_PROJECT: demo} secrets: {HF_TOKEN: my_scope/hf_token} parameters: {lr: 0.001, epochs: 3} mlflow_run_name: full-run-v2 max_retries: 2 timeout_minutes: 120 YAML ``` Automated tests ```bash go test ./experimental/air/... # unit (incl. submitWorkload vs a fake workspace) go test ./acceptance -run TestAccept/experimental/air # acceptance (run + unimplemented) ./task lint-q # lint changed files ``` Dry run: ```bash ./cli experimental air run -f /tmp/min.yaml --dry-run # note that this command will, in the final version, be databricks experimental air run ./cli experimental air run -f /tmp/full.yaml --dry-run ./cli experimental air run -f /tmp/min.yaml --dry-run -o json ``` Actual run submission: ```bash PROFILE=<your-dev-profile> # no auth configured → fails fast (exit 1) env -u DATABRICKS_HOST -u DATABRICKS_TOKEN ./cli experimental air run -f /tmp/min.yaml #> Error: ... (cannot configure default credentials / auth) # submit → prints run_id + dashboard URL ./cli experimental air run -f /tmp/min.yaml -p $PROFILE -o json #> { "data": { "status":"SUBMITTED", "run_id":"<id>", "dashboard_url":"<host>/jobs/runs/<id>" } } # verify in the workspace: open dashboard_url (run exists), and the MLflow experiment was created. ./cli experimental air get <run_id> -p $PROFILE # run state ./cli experimental air list -p $PROFILE # run appears in the list # idempotency — SAME key returns the SAME run_id (no new run) ./cli experimental air run -f /tmp/min.yaml -p $PROFILE --idempotency-key demo-key-1 -o json # run_id = X ./cli experimental air run -f /tmp/min.yaml -p $PROFILE --idempotency-key demo-key-1 -o json # run_id = X (same) # idempotency — DIFFERENT key creates a NEW run ./cli experimental air run -f /tmp/min.yaml -p $PROFILE --idempotency-key demo-key-2 -o json # run_id = Y (new) ``` Unsupported flags (asserting that error is thrown): ```bash ./cli experimental air run -f /tmp/min.yaml --dry-run --watch #> Error: --watch is not yet supported ./cli experimental air run -f /tmp/min.yaml --dry-run --override compute.num_accelerators=8 #> Error: --override is not yet supported # usage_policy_name (needs a workspace to reach the submit guard) printf 'experiment_name: t\ncommand: x\ncompute: {accelerator_type: GPU_1xH100, num_accelerators: 1}\nusage_policy_name: my-policy\n' > /tmp/policy.yaml ./cli experimental air run -f /tmp/policy.yaml -p $PROFILE #> Error: usage_policy_name is not yet supported # code_source printf 'experiment_name: t\ncommand: x\ncompute: {accelerator_type: GPU_1xH100, num_accelerators: 1}\ncode_source: {type: snapshot, snapshot: {root_path: .}}\n' > /tmp/code.yaml air run -f /tmp/code.yaml -p $PROFILE #> Error: code_source is not yet supported ``` Validation errors for field-specific message (exit 1, offline): ```bash # missing --file air run --dry-run #> Error: required flag(s) "file" not set # invalid experiment_name + num_accelerators not a multiple of the per-node count printf 'experiment_name: bad.name\ncommand: x\ncompute: {accelerator_type: GPU_8xH100, num_accelerators: 3}\n' > /tmp/bad.yaml air run -f /tmp/bad.yaml --dry-run #> Error: invalid experiment_name "bad.name": only alphanumeric characters, hyphens (-), and underscores (_) are allowed # (and, once the name is fixed: compute.num_accelerators for GPU_8xH100 must be a multiple of 8, got 3) ```
Configuration menu - View commit details
-
Copy full SHA for fc0ba3e - Browse repository at this point
Copy the full SHA fc0ba3eView commit details
This comparison is taking too long to generate.
Unfortunately it looks like we can’t render this comparison for you right now. It might be too big, or there might be something weird with your repository.
You can try running this command locally to see the comparison on your machine:
git diff main...air-cli