Skip to content

fs cp: show an upload progress bar for a single large-file copy#5758

Open
renaudhartert-db wants to merge 1 commit into
multipart/05-fs-cpfrom
multipart/06-fs-cp-progress
Open

fs cp: show an upload progress bar for a single large-file copy#5758
renaudhartert-db wants to merge 1 commit into
multipart/05-fs-cpfrom
multipart/06-fs-cp-progress

Conversation

@renaudhartert-db

@renaudhartert-db renaudhartert-db commented Jun 28, 2026

Copy link
Copy Markdown
Contributor

Context

databricks fs cp and bundle library uploads to Unity Catalog Volumes go through a single PUT /api/2.0/fs/files, which caps a file at the single-request size limit and pushes it over one connection. This stack adds chunked upload (multipart on AWS/Azure, resumable on GCP) so large files upload reliably and in parallel. The whole feature is gated behind the DATABRICKS_EXPERIMENTAL_MULTIPART_UPLOAD environment variable and is off by default, so merging the stack changes no behavior until the flag is set.

Stack

  1. libs/upload/cloudstorage: add cloud-storage transfer client #5753 cloud-storage data-plane client
  2. libs/upload/files: add Files API control-plane client #5754 Files API control-plane client
  3. libs/upload: add the chunked large-file upload engine #5755 chunked upload engine
  4. filer: route large Volumes writes through the multipart engine (off by default) #5756 route large Volumes writes through the engine
  5. fs cp: share one multipart transfer budget across a recursive copy #5757 fs cp shared transfer budget
  6. fs cp: show an upload progress bar for a single large-file copy #5758 fs cp progress bar (this PR)

This PR

Adds an upload progress bar to fs cp for a single large file copied to a Volume (when multipart is enabled). It reuses the bar from the experimental files-upload command: a spinner that animates a "Preparing upload" label during session setup, then a single-line bar with percentage, transfer rate, and ETA. The engine reports progress through a callback that the command threads down via the context (filer.WithUploadProgress), so the Filer.Write interface is unchanged. Recursive copies, non-Volumes targets, and other writers (bundle) are unaffected; a non-interactive terminal logs coarse progress instead of drawing the bar.

Testing

Unit tests cover the fixed-width bar invariant, the rolling-window rate meter (including burst smoothing), and the speed and ETA formatters. The interactive bar is exercised with a live fs cp of a multi-GB file.

This pull request and its description were written by Isaac.

@github-actions

Copy link
Copy Markdown
Contributor

Approval status: pending

/cmd/fs/ - needs approval

Files: cmd/fs/cp.go, cmd/fs/upload_progress.go, cmd/fs/upload_progress_test.go
Suggested: @Divyansh-db
Also eligible: @simonfaltum, @hectorcast-db, @parthban-db, @tanmay-db, @tejaskochar-db, @mihaimitrea-db, @chrisst, @rauchy

/libs/filer/ - needs approval

Files: libs/filer/files_client.go
Suggested: @Divyansh-db
Also eligible: @simonfaltum, @hectorcast-db, @parthban-db, @tanmay-db, @tejaskochar-db, @mihaimitrea-db, @chrisst, @rauchy

/libs/upload/ - needs approval

Files: libs/upload/upload.go
Suggested: @Divyansh-db
Also eligible: @simonfaltum, @hectorcast-db, @parthban-db, @tanmay-db, @tejaskochar-db, @mihaimitrea-db, @chrisst, @rauchy

Any maintainer (@andrewnester, @anton-107, @denik, @pietern, @shreyas-goenka, @simonfaltum) can approve all areas.
See OWNERS for ownership rules.

@eng-dev-ecosystem-bot

eng-dev-ecosystem-bot commented Jun 28, 2026

Copy link
Copy Markdown
Collaborator

Integration test report

Commit: 0314fdb

Run: 28328654587

Env 🟨​KNOWN 💚​RECOVERED 🙈​SKIP ✅​pass 🙈​skip Time
🟨​ aws linux 7 1 13 235 1035 7:14
🟨​ aws windows 7 1 13 237 1033 11:19
💚​ aws-ucws linux 8 13 322 952 6:41
💚​ aws-ucws windows 8 13 324 950 8:06
💚​ azure linux 2 15 235 1034 6:25
💚​ azure windows 2 15 237 1032 6:43
💚​ azure-ucws linux 2 15 324 949 6:59
💚​ azure-ucws windows 2 15 326 947 7:58
💚​ gcp linux 2 15 234 1036 6:13
💚​ gcp windows 2 15 236 1034 7:53
21 interesting tests: 13 SKIP, 7 KNOWN, 1 RECOVERED
Test Name aws linux aws windows aws-ucws linux aws-ucws windows azure linux azure windows azure-ucws linux azure-ucws windows gcp linux gcp windows
🟨​ TestAccept 🟨​K 🟨​K 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R
🙈​ TestAccept/bundle/invariant/no_drift 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/permissions 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🟨​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/with_permissions 🟨​K 🟨​K 💚​R 💚​R 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🟨​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/with_permissions/DATABRICKS_BUNDLE_ENGINE=direct 🟨​K 🟨​K 💚​R 💚​R
🟨​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/with_permissions/DATABRICKS_BUNDLE_ENGINE=terraform 🟨​K 🟨​K 💚​R 💚​R
🟨​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/without_permissions 🟨​K 🟨​K 💚​R 💚​R 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🟨​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/without_permissions/DATABRICKS_BUNDLE_ENGINE=direct 🟨​K 🟨​K 💚​R 💚​R
🟨​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/without_permissions/DATABRICKS_BUNDLE_ENGINE=terraform 🟨​K 🟨​K 💚​R 💚​R
🙈​ TestAccept/bundle/resources/postgres_branches/basic 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_branches/recreate 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_branches/replace_existing 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_branches/update_protected 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_branches/without_branch_id 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_endpoints/basic 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_projects/update_display_name 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/synced_database_tables/basic 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/vector_search_endpoints/drift/recreated_same_name 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/vector_search_indexes/recreate/embedding_dimension 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/ssh/connection 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
💚​ TestFetchRepositoryInfoAPI_FromRepo 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R
Top 8 slowest tests (at least 2 minutes):
duration env testname
6:51 azure-ucws windows TestAccept
6:51 gcp windows TestAccept
5:55 aws-ucws windows TestAccept
5:42 azure windows TestAccept
2:50 azure-ucws linux TestAccept
2:47 aws-ucws linux TestAccept
2:44 gcp linux TestAccept
2:44 azure linux TestAccept
Renders a live progress bar (the same one the experimental files-upload command
used) when copying a single file to a UC Volume with
DATABRICKS_EXPERIMENTAL_MULTIPART_UPLOAD enabled. The upload engine's progress
callback is threaded to the command through the context (filer.WithUploadProgress),
so the Filer.Write interface is unchanged; recursive copies, non-Volumes targets,
and other writers (e.g. bundle) are unaffected. A non-interactive terminal logs
coarse progress instead of drawing the bar.

Co-authored-by: Isaac
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

2 participants