Skip to content

harden: bound remaining catalog reads + lock in no-unbounded-read invariant#3145

Draft
PascalThuet wants to merge 3 commits into
github:mainfrom
PascalThuet:split/catalog-bounded-reads
Draft

harden: bound remaining catalog reads + lock in no-unbounded-read invariant#3145
PascalThuet wants to merge 3 commits into
github:mainfrom
PascalThuet:split/catalog-bounded-reads

Conversation

@PascalThuet

Copy link
Copy Markdown
Contributor

Part of splitting #2442 into smaller, dedicated PRs. Re-derived against current main. Final PR of the runtime-hardening stack.

Stacked on #3141 (which is itself stacked on #3140). Its diff includes those commits — review/merge #3140#3141 → this, in order.

What

  • Bound the remaining catalog JSON reads via read_response_limited(max_bytes=MAX_JSON_CATALOG_BYTES) and pass strict_redirects=True:
    • workflows/catalog.py, integrations/catalog.py, bundler/services/adapters.py, commands/bundle/__init__.py.
  • __init__.py workflow_add / workflow_step_add: bounded reads + strict_redirects=True + shared is_https_or_localhost_http URL validation.
  • extensions/_commands.py: bound the extension add --from <url> ZIP download — the last unbounded network read (surfaced by the gate below).

The invariant

Adds tests/test_download_security.py::test_remote_downloads_do_not_use_unbounded_response_reads: an AST scan over the whole src/specify_cli tree that fails CI if any response.read() is unbounded, with a 3-entry allowlist for the local-file get_hash reads (extensions/presets/integrations). This only goes green once #3140 + #3141 + this have converted every remote read — it's the lock that keeps the bound from silently regressing.

Validation

  • test_download_security (incl. the AST gate), test_workflows, test_integration_catalog, test_bundler_adapters, test_upgrade, test_extensions792 passed.
  • Bundle integration tests — 10 passed. ruff check clean.
  • Existing catalog/workflow tests pass with stream mocks made cursor-based (so they exercise the bounded reads) and strict_redirects threaded through the fake openers.

This completes the runtime hardening. The Bandit + detect-secrets CI gates land in a separate PR on the #2/#3 track.

Add a shared _download_security module (read_response_limited,
is_https_or_localhost_http, size constants) and route the GitHub release
and Azure DevOps token network reads through bounded reads so an oversized
response can't exhaust memory.

Add a strict_redirects mode to authentication.open_url: the redirect handler
now rejects any redirect whose target isn't HTTPS (or HTTP to localhost),
composing with the existing per-hop redirect_validator and auth-stripping.
The Azure DevOps token POST is routed through that handler so a 307/308
cannot forward the client_secret body to a non-HTTPS host.
…on/preset downloads

Extend _download_security with safe_extract_zip (path + symlink + per-member
and total size + entry-count bounds), read_zip_member_limited, and
verify_sha256, and adopt them across the extension and preset install paths:

- install_from_zip extracts via safe_extract_zip (zip-slip + symlink + zip-bomb
  defense) instead of the previous path-only containment check.
- catalog JSON fetches and package downloads use bounded reads; the inline
  manifest pre-read in extension update uses read_zip_member_limited.
- downloaded packages are verified against their catalog sha256 when present.
- CatalogStackBase and the preset catalog validator reject hostless URLs
  before the scheme check.

The command-registration path-traversal guard the original change carried is
already on main (github#3088) and is intentionally omitted.
…ad invariant

Route the remaining network JSON reads through read_response_limited and pass
strict_redirects=True on the workflow/integration/bundler catalog readers and
the workflow add / workflow step add download paths:

- workflows/catalog.py, integrations/catalog.py, bundler/services/adapters.py,
  commands/bundle/__init__.py: bounded catalog reads.
- __init__.py workflow_add / workflow_step_add: bounded reads + strict redirects
  + shared is_https_or_localhost_http validation.
- extensions/_commands.py: bound the `extension add --from <url>` ZIP download
  (the last unbounded network read).

Add test_remote_downloads_do_not_use_unbounded_response_reads: an AST gate that
scans the whole src/specify_cli tree and fails if any response.read() is
unbounded, except the three local-file get_hash reads. With this, every remote
read in the CLI is bounded.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR completes the runtime download-hardening stack by enforcing bounded HTTP response reads across remaining catalog/workflow/bundler paths, tightening redirect handling (strict HTTPS w/ loopback HTTP exception), and adding a repo-wide AST gate to prevent reintroducing unbounded response.read() calls.

Changes:

  • Introduces src/specify_cli/_download_security.py (bounded reads, URL predicate, ZIP extraction + member read bounds, optional sha256 verification) and threads it through remaining network read sites.
  • Enforces strict redirect validation via open_url(..., strict_redirects=True) (and the shared redirect handler) across workflows/integrations/bundler/versioning, with tests updated to use cursor-advancing response fakes.
  • Adds tests/test_download_security.py with both helper-level tests and an AST-based invariant check blocking unbounded .read() usage under src/specify_cli.
Show a summary per file
File Description
tests/unit/test_bundler_adapters.py Makes the response fake stream-like (read(size) cursor) to exercise bounded-read loops.
tests/test_workflows.py Updates workflow download/open_url fakes to accept strict_redirects and provide stream semantics.
tests/test_upgrade.py Adds regression coverage to pin _fetch_latest_release_tag to read_response_limited(max_bytes=...) and adapts mocks for bounded reads.
tests/test_self_upgrade_verification.py Imports shared fixture to route build_opener().open through patched urlopen in strict-redirect paths.
tests/test_self_upgrade_execution.py Same fixture import for strict-redirect opener routing.
tests/test_self_upgrade_detection.py Same fixture import for strict-redirect opener routing.
tests/test_presets.py Updates mocks to use stream-backed .read() and accommodates strict-redirect openers in preset flows.
tests/test_github_http.py Updates response mocks for bounded reads and adds redirect/auth preservation tests.
tests/test_extensions.py Updates mocks to stream semantics and adjusts URL validation expectation text.
tests/test_download_security.py Adds bounded-read/zip-safety tests plus AST gate forbidding unbounded .read() across src/specify_cli.
tests/test_authentication.py Updates redirect/strict-opener expectations and patches opener-based code paths.
tests/self_upgrade_helpers.py Re-exports the opener-routing fixture for reuse across self-upgrade tests.
tests/integrations/test_integration_catalog.py Updates catalog-fetch tests to cover both urlopen and opener.open paths and adds bounded-read assertion.
tests/http_helpers.py Changes JSON response mocks to stream semantics and adds an autouse fixture to route opener.open through urlopen.
src/specify_cli/workflows/catalog.py Validates catalog URLs via shared predicate, enforces strict redirects, and bounds catalog JSON reads.
src/specify_cli/presets/_commands.py Uses shared URL predicate + bounded reads for preset add --from, and routes GitHub release resolution via strict openers.
src/specify_cli/presets/init.py Uses safe_extract_zip, bounds catalog JSON reads, and verifies optional sha256 for downloaded packs.
src/specify_cli/integrations/catalog.py Enforces strict redirects and bounds integration catalog JSON reads.
src/specify_cli/extensions/_commands.py Bounds extension add --from <url> ZIP downloads and bounds pre-extract manifest reads.
src/specify_cli/extensions/init.py Uses safe_extract_zip, bounds catalog JSON reads, and verifies optional sha256 for extension downloads.
src/specify_cli/commands/bundle/init.py Bounds bundle catalog reads.
src/specify_cli/catalogs.py Tightens catalog URL validation order (host before scheme) and clarifies loopback HTTP messaging.
src/specify_cli/bundler/services/adapters.py Bounds bundler catalog JSON reads and decodes via the bounded-read helper.
src/specify_cli/authentication/http.py Adds strict_redirects option and enforces shared strict redirect validation inside the redirect handler.
src/specify_cli/authentication/azure_devops.py Routes token POST through strict redirect handler and bounds token JSON reads.
src/specify_cli/_version.py Uses bounded reads + strict redirects for GitHub latest-release metadata.
src/specify_cli/_github_http.py Bounds GitHub release metadata reads when resolving asset API URLs.
src/specify_cli/_download_security.py New central hardening helpers (bounded reads, strict URL predicate, safe ZIP extraction, sha256 verification).
src/specify_cli/init.py Applies bounded reads + strict redirects + shared URL predicate to workflow add/step add remote fetch paths.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

  • Files reviewed: 29/29 changed files
  • Comments generated: 2
Comment on lines +159 to +165
zip_path.write_bytes(
read_response_limited(
response,
error_type=PresetError,
label=f"preset {from_url}",
)
)
Comment on lines +243 to +253
def _is_unbounded_read(call: ast.Call) -> bool:
if call.args:
size = _constant_int(call.args[0])
return size is not None and size < 0

for keyword in call.keywords:
if keyword.arg == "size":
size = _constant_int(keyword.value)
return size is not None and size < 0

return True
@mnriem mnriem marked this pull request as draft June 24, 2026 18:49
@mnriem

mnriem commented Jun 24, 2026

Copy link
Copy Markdown
Collaborator

Converted to a draft as it is waiting for #3141

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

3 participants