Skip to content

Conversation

@corsonknowles
Copy link

@corsonknowles corsonknowles commented Jan 25, 2026

What are you trying to accomplish?

  • Use a non-backtracing "negated character class" to avoid any backtracking vulnerability.
  • Most of what this PR does is remove ? (The Lazy aka Non-Greedy Modifier) from regex. By providing a terminal character to look for, we remove the need to compare match lengths.

The problem:

This is an instruction to scan for .*?: anything, zero or more times, non-greedy (as few as possible). The \) then matches a literal closing paren. Together, .*?\) starts looking to find the shortest string ending in ).

The vulnerable pattern predicted here would be something like:
!(!(!(!(!(!(

From context, I think the purpose of the original regex is to process Yarn/npm workspace glob patterns from package.json, converting glob negation patterns like !(node_modules) to wildcards * for directory matching.
If that's incorrect, this deserves even more scrutiny.

The solution

The revised matcher can solve this in linear time:

  • !( - matches literalyl !(
  • [^)]* - matches any characters except )
  • ) - matches literally )

The [^)]* in the middle there is a zero or more matcher that cannot match a ) character, so it will immediately stop at the first ) it encounters. It avoids the use of ? There's no ambiguity and therefore no backtracking. As a single string scanner, it should always complete in in O(n) linear time.

To put that another way, it sees right past repeated openers, because it encodes the logic that it is only returning 1 parenthetical pattern.

The alert:

See CI alert:

Quoted in full:

Check failure on line 333 in bun/lib/dependabot/bun/file_fetcher.rb

Code scanning
/ CodeQL

Polynomial regular expression used on uncontrolled data
High

This regular expression
 that depends on a library input
 may run slow on strings starting with '!(' and with many repetitions of '!('.
 [Check failure on line 335 in bun/lib/dependabot/bun/file_fetcher.rb](https://github.com/dependabot/dependabot-core/commit/66539d246a1a9f3c194b444c7574fbd007ed74fd#annotation_44642197238) 

Code scanning
/ CodeQL

Polynomial regular expression used on uncontrolled data
High

This regular expression
 that depends on a library input
 may run slow on strings with many repetitions of '.'.
 [Check failure on line 349 in bun/lib/dependabot/bun/file_fetcher.rb](https://github.com/dependabot/dependabot-core/commit/66539d246a1a9f3c194b444c7574fbd007ed74fd#annotation_44642197244) 

Code scanning
/ CodeQL

Polynomial regular expression used on uncontrolled data
High

This regular expression
 that depends on a library input
 may run slow on strings starting with '!(' and with many repetitions of '!('.
 [Check failure on line 33 in helm/lib/dependabot/helm/file_fetcher.rb](https://github.com/dependabot/dependabot-core/commit/66539d246a1a9f3c194b444c7574fbd007ed74fd#annotation_44642197246) 

Code scanning
/ CodeQL

Polynomial regular expression used on uncontrolled data
High

This regular expression
 that depends on a library input
 may run slow on strings with many repetitions of 'a'.
 [Check failure on line 473 in npm_and_yarn/lib/dependabot/npm_and_yarn/file_fetcher.rb](https://github.com/dependabot/dependabot-core/commit/66539d246a1a9f3c194b444c7574fbd007ed74fd#annotation_44642197251) 

Code scanning
/ CodeQL

Polynomial regular expression used on uncontrolled data
High

This regular expression
 that depends on a library input
 may run slow on strings starting with '!(' and with many repetitions of '!('.
 [Check failure on line 566 in npm_and_yarn/lib/dependabot/npm_and_yarn/file_fetcher.rb](https://github.com/dependabot/dependabot-core/commit/66539d246a1a9f3c194b444c7574fbd007ed74fd#annotation_44642197255) 

Code scanning
/ CodeQL

Polynomial regular expression used on uncontrolled data
High

This regular expression
 that depends on a library input
 may run slow on strings starting with '!(' and with many repetitions of '!('.
 [Check failure on line 568 in npm_and_yarn/lib/dependabot/npm_and_yarn/file_fetcher.rb](https://github.com/dependabot/dependabot-core/commit/66539d246a1a9f3c194b444c7574fbd007ed74fd#annotation_44642197261) 

Code scanning
/ CodeQL

Polynomial regular expression used on uncontrolled data
High

This regular expression
 that depends on a library input
 may run slow on strings with many repetitions of '.'.
 [Check failure on line 637 in npm_and_yarn/lib/dependabot/npm_and_yarn/file_fetcher.rb](https://github.com/dependabot/dependabot-core/commit/66539d246a1a9f3c194b444c7574fbd007ed74fd#annotation_44642197269) 

Code scanning
/ CodeQL

Polynomial regular expression used on uncontrolled data
High

This regular expression
 that depends on a library input
 may run slow on strings starting with '!(' and with many repetitions of '!('.
 [Check failure on line 637 in npm_and_yarn/lib/dependabot/npm_and_yarn/file_fetcher.rb](https://github.com/dependabot/dependabot-core/commit/66539d246a1a9f3c194b444c7574fbd007ed74fd#annotation_44642197279) 

Code scanning
/ CodeQL

Polynomial regular expression used on uncontrolled data
High

This regular expression
 that depends on a library input
 may run slow on strings starting with '!(' and with many repetitions of '!(a'.

Anything you want to highlight for special attention from reviewers?

NOTE: Limiting length of input might be a viable alternative solution (which probably does not satisfy the CodeQL alert, as I assume its not that smart of a scanner). See:

I guess while I'm here, I'll add that ideally this would not only be solved at the periphery by callers, but in the core search pattern:

As far as I can tell, Ruby's Onigmo engine does solve for this case now.

Fixing here at the periphery provides "defense in depth" and mitigates against regressions. It removes theoretical risk, and should be a more friendly example for any copy and paste in the future. The most practical impact is that the specs pass, and the code scanner will no longer complain.

In fact CodeQL docs acknowledge this:
https://codeql.github.com/codeql-query-help/ruby/rb-redos/

Note that Ruby 3.2 and later have implemented a caching mechanism that completely eliminates the worst-case time complexity for the regular expressions flagged by this query. The regular expressions flagged by this query are therefore only problematic for Ruby versions prior to 3.2.

If CodeQL were able to check the dependency versions, then this would not flag, but they confirmed in an issue that they cannot.

How will you know you've accomplished your goal?

I won't get this input on an unreleted PR:
Screenshot 2026-01-26 at 5 14 15 AM

Checklist

  • I have run the complete test suite to ensure all tests and linters pass.
  • I have thoroughly tested my code changes to ensure they work as expected, including adding additional tests for new functionality.
  • I have written clear and descriptive commit messages.
  • I have provided a detailed description of the changes in the pull request, including the problem it addresses, how it fixes the problem, and any relevant details about the implementation.
  • I have ensured that the code is well-documented and easy to understand.
@corsonknowles corsonknowles marked this pull request as ready for review January 26, 2026 13:15
@corsonknowles corsonknowles requested a review from a team as a code owner January 26, 2026 13:15
@corsonknowles corsonknowles changed the title Address CodeQL regex alert Jan 26, 2026
@JamieMagee
Copy link
Member

Check failure on line 33 in helm/lib/dependabot/helm/file_fetcher.rb

Can you address this one as well please?

…d casecmp in Helm for yaml and chart matching
@github-actions github-actions bot added L: elixir:hex Elixir packages via hex L: helm labels Jan 29, 2026
class FileFetcher < Dependabot::Shared::SharedFileFetcher
FILENAME_REGEX = /.*\.ya?ml$/i
CHART_LOCK_REGEXP = /Chart\.lock/i
FILENAME_REGEX = /\.ya?ml\z/i
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is used by the parent class

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the leading .* isn't necessary with match?

the ending \z is security recommended approach to begin and end matchers

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

e.g. prevents a match on bad.yaml\n.exe which probably doesn't matter here on it's own, but is good to discard

FILENAME_REGEX = /.*\.ya?ml$/i
CHART_LOCK_REGEXP = /Chart\.lock/i
FILENAME_REGEX = /\.ya?ml\z/i
CHART_LOCK_FILENAME = "Chart.lock"
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can just match on the string here, no regex needed

T.let(
repo_contents(raise_errors: false)
.select { |f| f.type == "file" && f.name.match?(CHART_LOCK_REGEXP) }
.select { |f| f.type == "file" && f.name.casecmp(CHART_LOCK_FILENAME).zero? }
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I profiled this a little bit and this seemed ~30% faster

@corsonknowles
Copy link
Author

corsonknowles commented Jan 30, 2026

@JamieMagee

Thanks Jamie, ready for your input again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

2 participants