Fully ignore private IP literals as outbound connections (early return)#310
Open
Mishenevd wants to merge 1 commit into
Open
Fully ignore private IP literals as outbound connections (early return)#310Mishenevd wants to merge 1 commit into
Mishenevd wants to merge 1 commit into
Conversation
Follow-up to #308. The agent records every getAllByName() argument as an outbound connection, including raw private/internal IP literals. These come from infrastructure rather than real outbound domains: the Reactor Netty resolver bootstrap resolving the any-address/nameservers, service discovery connecting by IP, a library building a private-IP matcher at startup, etc. They flooded the "new outbound connection" feature with private IPs on port 0. #308 skipped recording them but still fell through to the outbound-domain blocking check, so in lockdown mode (blockNewOutgoingRequests) these internal resolutions would be blocked and break the app. This returns early for private IP literals, skipping both the record and the block, consistent with the other Zen agents. Real domains that resolve to private IPs are not literals, so they fall through and are still tracked, blocked by lockdown, and SSRF-checked. SSRF is unaffected: it never fires on a literal (hostname == ip is treated as safe). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Mishenevd
pushed a commit
that referenced
this pull request
Jul 1, 2026
…gression Follow-up to the reverted #308/#310. Customer flood was InetAddress.getAllByName() picking up Reactor Netty's own DNS-resolver bootstrap noise (0.0.0.0, ::, /etc/resolv.conf nameservers) as "new outbound connections" on port 0, and blocking them in lockdown mode. #310 fixed the flood with an early return in DNSRecordCollector.report() that also skipped the SSRF check below it - verified with a regression test that this let an attacker-supplied private-IP literal (e.g. a webhook field pointing straight at 169.254.169.254) through undetected. Investigating further found the actual root cause is bigger: Spring's WebClient was never instrumented at all, and Reactor Netty's default HTTP client bypasses InetAddress.getAllByName() entirely (it uses its own async DNS resolver). So even after wrapping WebClient to register pending ports, DNSRecordCollector was never invoked for real WebClient targets - confirmed empirically via trace logs against a live running app, with distinct markers proving InetAddressWrapper never fires for WebClient/Reactor Netty traffic in this configuration. WebClient had zero outbound-domain visibility and zero SSRF protection, independent of the original bug. - DNSRecordCollector: narrow the private-IP-literal gate to only skip recording + outbound blocking when there's no pending port (genuine infra noise). SSRF checks are unconditional again, fixing the bypass above. - SpringWebClientWrapper: register pending host+port for every WebClient request by hooking ExchangeFunction.exchange(), the interface every WebClient call goes through, same pattern as the existing OkHttp/Apache/JDK HttpClient wrappers. Uses string-based ByteBuddy matchers (hasSuperType(named(...))) instead of .class literals, since spring-webflux is compileOnly and only present on the target app's classloader - a .class reference in the matcher crashes the agent at premain with NoClassDefFoundError. - SocketChannelWrapper: hook java.nio.channels.SocketChannel.connect(), the JDK-level call every NIO-based client (including Reactor Netty) makes once it has a resolved address, regardless of which DNS resolver produced it. This is what actually closes the gap for WebClient, and it also catches literal IP targets that never go through any resolver at all. Not Netty-specific instrumentation - it's a generic JDK hook with no references to io.netty.* types. - DNSRecordCollector.reportConnect(): entry point for the new hook. Peeks the pending port instead of consuming it (report()'s getAndRemove), because a single request can trigger multiple connect() calls to the same hostname (e.g. the IPv4 then IPv6 address of a dual-stack host like localhost). Consuming on the first attempt let a blocked SSRF target succeed on the second attempt via the other address family - found live, fixed, covered by a regression test. - PendingHostnamesStore: peeking instead of consuming means entries rely on WebRequestCollector's per-incoming-request clear() for cleanup, which never fires for WebClient calls made outside any incoming-request context (e.g. a @scheduled background task). Capped the store at 1000 entries per thread, evicting the least-recently-used one once exceeded - the same bounded-LRU pattern (LinkedHashMap with accessOrder=true + removeEldestEntry()) already used by Hostnames.java for the same class of problem. Deliberately not a time-based TTL, to avoid a timing-dependent race reopening the dual-stack gap under load. - RequestController (SpringWebfluxSampleApp): new /api/request endpoint used to validate all of the above against a real running app end to end. Known limitation, not fixed here: Spring WebFlux has no request-body taint tracking at all (SpringWebfluxContextObject never populates ContextObject.body), so SSRF via JSON body can't be detected for WebFlux apps regardless of this change - flagged separately, doesn't regress anything.
Mishenevd
pushed a commit
that referenced
this pull request
Jul 1, 2026
…gression Follow-up to the reverted #308/#310. Customer flood was InetAddress.getAllByName() picking up Reactor Netty's own DNS-resolver bootstrap noise (0.0.0.0, ::, /etc/resolv.conf nameservers) as "new outbound connections" on port 0, and blocking them in lockdown mode. #310 fixed the flood with an early return in DNSRecordCollector.report() that also skipped the SSRF check below it - verified with a regression test that this let an attacker-supplied private-IP literal (e.g. a webhook field pointing straight at 169.254.169.254) through undetected. Investigating further found the actual root cause is bigger: Spring's WebClient was never instrumented at all, and Reactor Netty's default HTTP client bypasses InetAddress.getAllByName() entirely (it uses its own async DNS resolver). So even after wrapping WebClient to register pending ports, DNSRecordCollector was never invoked for real WebClient targets - confirmed empirically via trace logs against a live running app, with distinct markers proving InetAddressWrapper never fires for WebClient/Reactor Netty traffic in this configuration. WebClient had zero outbound-domain visibility and zero SSRF protection, independent of the original bug. - DNSRecordCollector: narrow the private-IP-literal gate to only skip recording + outbound blocking when there's no pending port (genuine infra noise). SSRF checks are unconditional again, fixing the bypass above. - SpringWebClientWrapper: register pending host+port for every WebClient request by hooking ExchangeFunction.exchange(), the interface every WebClient call goes through, same pattern as the existing OkHttp/Apache/JDK HttpClient wrappers. Uses string-based ByteBuddy matchers (hasSuperType(named(...))) instead of .class literals, since spring-webflux is compileOnly and only present on the target app's classloader - a .class reference in the matcher crashes the agent at premain with NoClassDefFoundError. - SocketChannelWrapper: hook java.nio.channels.SocketChannel.connect(), the JDK-level call every NIO-based client (including Reactor Netty) makes once it has a resolved address, regardless of which DNS resolver produced it. This is what actually closes the gap for WebClient, and it also catches literal IP targets that never go through any resolver at all. Not Netty-specific instrumentation - it's a generic JDK hook with no references to io.netty.* types. - DNSRecordCollector.reportConnect(): entry point for the new hook. Peeks the pending port instead of consuming it (report()'s getAndRemove), because a single request can trigger multiple connect() calls to the same hostname (e.g. the IPv4 then IPv6 address of a dual-stack host like localhost). Consuming on the first attempt let a blocked SSRF target succeed on the second attempt via the other address family - found live, fixed, covered by a regression test. - PendingHostnamesStore: peeking instead of consuming means entries rely on WebRequestCollector's per-incoming-request clear() for cleanup, which never fires for WebClient calls made outside any incoming-request context (e.g. a @scheduled background task). Capped the store at 1000 entries per thread, evicting the least-recently-used one once exceeded - the same bounded-LRU pattern (LinkedHashMap with accessOrder=true + removeEldestEntry()) already used by Hostnames.java for the same class of problem. Deliberately not a time-based TTL, to avoid a timing-dependent race reopening the dual-stack gap under load. - RequestController (SpringWebfluxSampleApp): new /api/request endpoint used to validate all of the above against a real running app end to end. Known limitation, not fixed here: Spring WebFlux has no request-body taint tracking at all (SpringWebfluxContextObject never populates ContextObject.body), so SSRF via JSON body can't be detected for WebFlux apps regardless of this change - flagged separately, doesn't regress anything.
Mishenevd
pushed a commit
that referenced
this pull request
Jul 1, 2026
…gression Follow-up to the reverted #308/#310. Customer flood was InetAddress.getAllByName() picking up Reactor Netty's own DNS-resolver bootstrap noise (0.0.0.0, ::, /etc/resolv.conf nameservers) as "new outbound connections" on port 0, and blocking them in lockdown mode. #310 fixed the flood with an early return in DNSRecordCollector.report() that also skipped the SSRF check below it - verified with a regression test that this let an attacker-supplied private-IP literal (e.g. a webhook field pointing straight at 169.254.169.254) through undetected. Investigating further found the actual root cause is bigger: Spring's WebClient was never instrumented at all, and Reactor Netty's default HTTP client bypasses InetAddress.getAllByName() entirely (it uses its own async DNS resolver). So even after wrapping WebClient to register pending ports, DNSRecordCollector was never invoked for real WebClient targets - confirmed empirically via trace logs against a live running app, with distinct markers proving InetAddressWrapper never fires for WebClient/Reactor Netty traffic in this configuration. WebClient had zero outbound-domain visibility and zero SSRF protection, independent of the original bug. - DNSRecordCollector: narrow the private-IP-literal gate to only skip recording + outbound blocking when there's no pending port (genuine infra noise). SSRF checks are unconditional again, fixing the bypass above. - SpringWebClientWrapper: register pending host+port for every WebClient request by hooking ExchangeFunction.exchange(), the interface every WebClient call goes through, same pattern as the existing OkHttp/Apache/JDK HttpClient wrappers. Uses string-based ByteBuddy matchers (hasSuperType(named(...))) instead of .class literals, since spring-webflux is compileOnly and only present on the target app's classloader - a .class reference in the matcher crashes the agent at premain with NoClassDefFoundError. - SocketChannelWrapper: hook java.nio.channels.SocketChannel.connect(), the JDK-level call every NIO-based client (including Reactor Netty) makes once it has a resolved address, regardless of which DNS resolver produced it. This is what actually closes the gap for WebClient, and it also catches literal IP targets that never go through any resolver at all. Not Netty-specific instrumentation - it's a generic JDK hook with no references to io.netty.* types. - DNSRecordCollector.reportConnect(): entry point for the new hook. Peeks the pending port instead of consuming it (report()'s getAndRemove), because a single request can trigger multiple connect() calls to the same hostname (e.g. the IPv4 then IPv6 address of a dual-stack host like localhost). Consuming on the first attempt let a blocked SSRF target succeed on the second attempt via the other address family - found live, fixed, covered by a regression test. - PendingHostnamesStore: peeking instead of consuming means entries rely on WebRequestCollector's per-incoming-request clear() for cleanup, which never fires for WebClient calls made outside any incoming-request context (e.g. a @scheduled background task). Capped the store at 1000 entries per thread, evicting the least-recently-used one once exceeded - the same bounded-LRU pattern (LinkedHashMap with accessOrder=true + removeEldestEntry()) already used by Hostnames.java for the same class of problem. Deliberately not a time-based TTL, to avoid a timing-dependent race reopening the dual-stack gap under load. - RequestController (SpringWebfluxSampleApp): new /api/request endpoint used to validate all of the above against a real running app end to end. Known limitation, not fixed here: Spring WebFlux has no request-body taint tracking at all (SpringWebfluxContextObject never populates ContextObject.body), so SSRF via JSON body can't be detected for WebFlux apps regardless of this change - flagged separately, doesn't regress anything.
Mishenevd
pushed a commit
that referenced
this pull request
Jul 1, 2026
…gression Follow-up to the reverted #308/#310. Customer flood was InetAddress.getAllByName() picking up Reactor Netty's own DNS-resolver bootstrap noise (0.0.0.0, ::, /etc/resolv.conf nameservers) as "new outbound connections" on port 0, and blocking them in lockdown mode. #310 fixed the flood with an early return in DNSRecordCollector.report() that also skipped the SSRF check below it - verified with a regression test that this let an attacker-supplied private-IP literal (e.g. a webhook field pointing straight at 169.254.169.254) through undetected. Investigating further found the actual root cause is bigger: Spring's WebClient was never instrumented at all, and Reactor Netty's default HTTP client bypasses InetAddress.getAllByName() entirely (it uses its own async DNS resolver). So even after wrapping WebClient to register pending ports, DNSRecordCollector was never invoked for real WebClient targets - confirmed empirically via trace logs against a live running app, with distinct markers proving InetAddressWrapper never fires for WebClient/Reactor Netty traffic in this configuration. WebClient had zero outbound-domain visibility and zero SSRF protection, independent of the original bug. - DNSRecordCollector: narrow the private-IP-literal gate to only skip recording + outbound blocking when there's no pending port (genuine infra noise). SSRF checks are unconditional again, fixing the bypass above. - SpringWebClientWrapper: register pending host+port for every WebClient request by hooking ExchangeFunction.exchange(), the interface every WebClient call goes through, same pattern as the existing OkHttp/Apache/JDK HttpClient wrappers. Uses string-based ByteBuddy matchers (hasSuperType(named(...))) instead of .class literals, since spring-webflux is compileOnly and only present on the target app's classloader - a .class reference in the matcher crashes the agent at premain with NoClassDefFoundError. - SocketChannelWrapper: hook java.nio.channels.SocketChannel.connect(), the JDK-level call every NIO-based client (including Reactor Netty) makes once it has a resolved address, regardless of which DNS resolver produced it. This is what actually closes the gap for WebClient, and it also catches literal IP targets that never go through any resolver at all. Not Netty-specific instrumentation - it's a generic JDK hook with no references to io.netty.* types. - DNSRecordCollector.reportConnect(): entry point for the new hook. Peeks the pending port instead of consuming it (report()'s getAndRemove), because a single request can trigger multiple connect() calls to the same hostname (e.g. the IPv4 then IPv6 address of a dual-stack host like localhost). Consuming on the first attempt let a blocked SSRF target succeed on the second attempt via the other address family - found live, fixed, covered by a regression test. - PendingHostnamesStore: peeking instead of consuming means entries rely on WebRequestCollector's per-incoming-request clear() for cleanup, which never fires for WebClient calls made outside any incoming-request context (e.g. a @scheduled background task). Capped the store at 1000 entries per thread, evicting the least-recently-used one once exceeded - the same bounded-LRU pattern (LinkedHashMap with accessOrder=true + removeEldestEntry()) already used by Hostnames.java for the same class of problem. Deliberately not a time-based TTL, to avoid a timing-dependent race reopening the dual-stack gap under load. - SpringWebClientRedirectWrapper: WebClient calls with followRedirect(true) never re-invoke Spring's request-adaptation layer for redirect hops (Reactor Netty resends bodiless requests internally), so a redirect to a private IP was invisible to both tracking and SSRF - same failure mode as the DNS gap above, just one layer up. Hooks HttpClientConnect$HttpClientHandler.redirect() (package-private, mirroring the same tradeoff HttpConnectionRedirectWrapper already makes for the JDK's equally-internal followRedirect0) and feeds the chain into the existing RedirectCollector/PrivateIPRedirectFinder mechanism, the same one already used for JDK HttpURLConnection redirects. - RequestController (SpringWebfluxSampleApp): /api/request endpoint (plus a followRedirect(true) variant) used to validate all of the above against a real running app end to end, and now wired into end2end/spring_webflux_postgres.py as an automated "ssrf" e2e payload. Known limitation, not fixed here: Spring WebFlux has no request-body taint tracking at all (SpringWebfluxContextObject never populates ContextObject.body), so SSRF via JSON body can't be detected for WebFlux apps regardless of this change - flagged separately, doesn't regress anything.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Follow-up to #308. The agent records every
getAllByName()argument as an outbound connection, including raw private/internal IP literals. Those come from infrastructure, not real outbound domains, and flooded the "new outbound connection" feature with private IPs on port 0.Sources we confirmed:
0.0.0.0and::fromio.netty.channel.epoll.Native.<clinit>, and each/etc/resolv.confnameserver viaUnixResolverDnsServerAddressStreamProvider(private on ECS: VPC resolver10.x,169.254.169.253,127.0.0.53).10.20.x.x), and a startup library resolving RFC1918 base addresses (10.0.0.0,172.16.0.0, ...).The fix
DNSRecordCollector.report()returns early when the looked-up host is a private IP literal:#308 only skipped the
HostnamesStorerecord but still fell through to the outbound-domain blocking check. In lockdown mode (blockNewOutgoingRequests) that would block these internal resolutions and break the app. The early return skips both the record and the block, consistent with the other Zen agents.Behaviour
getAllByName("10.20.11.143"), or Netty bootstrap resolving0.0.0.0/ nameservers)http://10.0.0.1:8080)URLCollectorregisters the pending port, thengetAllByName("10.0.0.1")returns early. Nothing recorded, not blocked in lockdown, and the pending port is still consumed.keycloak.internal...)SSRF is unaffected: it never fires on an IP literal (
hostname == ipis treated as "no resolution, safe"), and real domains do not hit the early return.Tests
testPrivateIpLiteralNotRecordedAsOutboundHostname,testPrivateIpLiteralWithPendingPortStillConsumedButNotRecorded(from Don't record private IP literals as outbound hostnames (Zen alert flood) #308) still pass.testPrivateIpLiteralNotBlockedInLockdownMode— a private IP literal is not blocked in lockdown mode.testPrivateIpLiteralViaUrlInLockdownNotBlockedNorRecorded— private IP via URL: not recorded, not blocked, pending port consumed.testHostnameResolvingToPrivateIpStillRecorded,testPublicIpLiteralStillRecordedconfirm domains and public IPs are unaffected.🤖 Generated with Claude Code
Summary by Aikido
🐛 Bugfixes
More info