Skip to content

fs cp: share one multipart transfer budget across a recursive copy#5757

Open
renaudhartert-db wants to merge 1 commit into
multipart/04-filerfrom
multipart/05-fs-cp
Open

fs cp: share one multipart transfer budget across a recursive copy#5757
renaudhartert-db wants to merge 1 commit into
multipart/04-filerfrom
multipart/05-fs-cp

Conversation

@renaudhartert-db

@renaudhartert-db renaudhartert-db commented Jun 28, 2026

Copy link
Copy Markdown
Contributor

Context

databricks fs cp and bundle library uploads to Unity Catalog Volumes go through a single PUT /api/2.0/fs/files, which caps a file at the single-request size limit and pushes it over one connection. This stack adds chunked upload (multipart on AWS/Azure, resumable on GCP) so large files upload reliably and in parallel. The whole feature is gated behind the DATABRICKS_EXPERIMENTAL_MULTIPART_UPLOAD environment variable and is off by default, so merging the stack changes no behavior until the flag is set.

Stack

  1. libs/upload/cloudstorage: add cloud-storage transfer client #5753 cloud-storage data-plane client
  2. libs/upload/files: add Files API control-plane client #5754 Files API control-plane client
  3. libs/upload: add the chunked large-file upload engine #5755 chunked upload engine
  4. filer: route large Volumes writes through the multipart engine (off by default) #5756 route large Volumes writes through the engine
  5. fs cp: share one multipart transfer budget across a recursive copy #5757 fs cp shared transfer budget (this PR)
  6. fs cp: show an upload progress bar for a single large-file copy #5758 fs cp progress bar

This PR

Decouples the two concurrency knobs in fs cp. --concurrency keeps governing how many files copy at once and stays at the Files-API-safe default; the multipart cloud-transfer budget is sized independently and, when the flag is on, set wider for part uploads, which go to cloud storage rather than the rate-limited Files API. Keeping them separate means a recursive copy of many small files does not burst the Files API, while a single large file can still fan its parts out wide. The budget is applied only to the Volumes target filer.

Testing

Covered by the existing cmd/fs tests; the decoupling and Volumes-only budget are validated with a live recursive fs cp to a Volume.

This pull request and its description were written by Isaac.

@github-actions

Copy link
Copy Markdown
Contributor

Waiting for approval

Based on git history, these people are best suited to review:

  • @pietern -- recent work in cmd/fs/

Eligible reviewers: @Divyansh-db, @chrisst, @hectorcast-db, @mihaimitrea-db, @parthban-db, @rauchy, @simonfaltum, @tanmay-db, @tejaskochar-db

Suggestions based on git history. See OWNERS for ownership rules.

@eng-dev-ecosystem-bot

eng-dev-ecosystem-bot commented Jun 28, 2026

Copy link
Copy Markdown
Collaborator

Integration test report

Commit: 99fd543

Run: 28328652518

Env 🟨​KNOWN 💚​RECOVERED 🙈​SKIP ✅​pass 🙈​skip Time
🟨​ aws linux 7 1 13 235 1035 7:15
🟨​ aws windows 7 1 13 237 1033 10:45
💚​ aws-ucws linux 8 13 322 952 6:42
💚​ aws-ucws windows 8 13 324 950 7:35
💚​ azure linux 2 15 235 1034 6:16
💚​ azure windows 2 15 237 1032 7:09
💚​ azure-ucws linux 2 15 324 949 6:58
💚​ azure-ucws windows 2 15 326 947 7:14
💚​ gcp linux 2 15 234 1036 6:05
💚​ gcp windows 2 15 236 1034 7:04
21 interesting tests: 13 SKIP, 7 KNOWN, 1 RECOVERED
Test Name aws linux aws windows aws-ucws linux aws-ucws windows azure linux azure windows azure-ucws linux azure-ucws windows gcp linux gcp windows
🟨​ TestAccept 🟨​K 🟨​K 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R
🙈​ TestAccept/bundle/invariant/no_drift 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/permissions 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🟨​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/with_permissions 🟨​K 🟨​K 💚​R 💚​R 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🟨​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/with_permissions/DATABRICKS_BUNDLE_ENGINE=direct 🟨​K 🟨​K 💚​R 💚​R
🟨​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/with_permissions/DATABRICKS_BUNDLE_ENGINE=terraform 🟨​K 🟨​K 💚​R 💚​R
🟨​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/without_permissions 🟨​K 🟨​K 💚​R 💚​R 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🟨​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/without_permissions/DATABRICKS_BUNDLE_ENGINE=direct 🟨​K 🟨​K 💚​R 💚​R
🟨​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/without_permissions/DATABRICKS_BUNDLE_ENGINE=terraform 🟨​K 🟨​K 💚​R 💚​R
🙈​ TestAccept/bundle/resources/postgres_branches/basic 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_branches/recreate 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_branches/replace_existing 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_branches/update_protected 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_branches/without_branch_id 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_endpoints/basic 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_projects/update_display_name 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/synced_database_tables/basic 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/vector_search_endpoints/drift/recreated_same_name 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/vector_search_indexes/recreate/embedding_dimension 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/ssh/connection 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
💚​ TestFetchRepositoryInfoAPI_FromRepo 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R
Top 8 slowest tests (at least 2 minutes):
duration env testname
6:02 aws-ucws windows TestAccept
5:59 gcp windows TestAccept
5:55 azure-ucws windows TestAccept
5:54 azure windows TestAccept
2:48 azure-ucws linux TestAccept
2:47 azure linux TestAccept
2:47 aws-ucws linux TestAccept
2:46 gcp linux TestAccept
Routes the fs cp Volumes target filer through the large-file upload engine when
DATABRICKS_EXPERIMENTAL_MULTIPART_UPLOAD is set. The file-level copy concurrency
(--concurrency) stays at the Files-API-safe default; the multipart cloud-transfer
budget is sized independently, shared across all files in a recursive copy, and
applied only to the Volumes target filer.

Co-authored-by: Isaac
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

2 participants