-
Notifications
You must be signed in to change notification settings - Fork 1.3k
[SECURITY] Address CodeQL regex alert #14012
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[SECURITY] Address CodeQL regex alert #14012
Conversation
Can you address this one as well please? |
…d casecmp in Helm for yaml and chart matching
| class FileFetcher < Dependabot::Shared::SharedFileFetcher | ||
| FILENAME_REGEX = /.*\.ya?ml$/i | ||
| CHART_LOCK_REGEXP = /Chart\.lock/i | ||
| FILENAME_REGEX = /\.ya?ml\z/i |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is used by the parent class
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the leading .* isn't necessary with match?
the ending \z is security recommended approach to begin and end matchers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
e.g. prevents a match on bad.yaml\n.exe which probably doesn't matter here on it's own, but is good to discard
| FILENAME_REGEX = /.*\.ya?ml$/i | ||
| CHART_LOCK_REGEXP = /Chart\.lock/i | ||
| FILENAME_REGEX = /\.ya?ml\z/i | ||
| CHART_LOCK_FILENAME = "Chart.lock" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can just match on the string here, no regex needed
| T.let( | ||
| repo_contents(raise_errors: false) | ||
| .select { |f| f.type == "file" && f.name.match?(CHART_LOCK_REGEXP) } | ||
| .select { |f| f.type == "file" && f.name.casecmp(CHART_LOCK_FILENAME).zero? } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I profiled this a little bit and this seemed ~30% faster
|
Thanks Jamie, ready for your input again! |
What are you trying to accomplish?
?(The Lazy aka Non-Greedy Modifier) from regex. By providing a terminal character to look for, we remove the need to compare match lengths.The problem:
This is an instruction to scan for
.*?: anything, zero or more times, non-greedy (as few as possible). The\)then matches a literal closing paren. Together, .*?\)starts looking to find the shortest string ending in).The vulnerable pattern predicted here would be something like:
!(!(!(!(!(!(From context, I think the purpose of the original regex is to process Yarn/npm workspace glob patterns from
package.json, converting glob negation patterns like!(node_modules)to wildcards*for directory matching.If that's incorrect, this deserves even more scrutiny.
The solution
The revised matcher can solve this in linear time:
!())The
[^)]*in the middle there is a zero or more matcher that cannot match a)character, so it will immediately stop at the first)it encounters. It avoids the use of?There's no ambiguity and therefore no backtracking. As a single string scanner, it should always complete in in O(n) linear time.To put that another way, it sees right past repeated openers, because it encodes the logic that it is only returning 1 parenthetical pattern.
The alert:
See CI alert:
Quoted in full:
Anything you want to highlight for special attention from reviewers?
NOTE: Limiting length of input might be a viable alternative solution (which probably does not satisfy the CodeQL alert, as I assume its not that smart of a scanner). See:
I guess while I'm here, I'll add that ideally this would not only be solved at the periphery by callers, but in the core search pattern:
As far as I can tell, Ruby's Onigmo engine does solve for this case now.
Fixing here at the periphery provides "defense in depth" and mitigates against regressions. It removes theoretical risk, and should be a more friendly example for any copy and paste in the future. The most practical impact is that the specs pass, and the code scanner will no longer complain.
In fact CodeQL docs acknowledge this:
https://codeql.github.com/codeql-query-help/ruby/rb-redos/
If CodeQL were able to check the dependency versions, then this would not flag, but they confirmed in an issue that they cannot.
How will you know you've accomplished your goal?
I won't get this input on an unreleted PR:

Checklist