fix(extract): guard against large SQL file stack overflow (#691)#698
Open
lg320531124 wants to merge 1 commit into
Open
fix(extract): guard against large SQL file stack overflow (#691)#698lg320531124 wants to merge 1 commit into
lg320531124 wants to merge 1 commit into
Conversation
3875a72 to
0b83cfb
Compare
Owner
|
Huge thanks for opening this PR and for the work you put into it. The maintainer shop is currently full, so this may sit for a bit before it gets a proper review. We will come back to this as soon as possible with real feedback; I wanted to make sure it did not sit unacknowledged in the meantime. |
…ter parser The tree-sitter SQL grammar (39 MB parser.c) uses deeply recursive non-terminals that overflow the C stack on files with many statements (e.g. schema dumps >10K lines, stored procedures). The stack overflow kills the thread before any timeout callback can fire, so we must reject the file *before* calling ts_parser_parse_with_options(). Add a pre-parse line-count guard (CBM_SQL_MAX_LINES = 5000) in cbm_extract_file() that returns has_error for SQL files exceeding the threshold, allowing the indexing pipeline to skip gracefully rather than crash. Fixes DeusData#691 Signed-off-by: lg320531124 <lg320531124@users.noreply.github.com>
0b83cfb to
e8db9e5
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The tree-sitter SQL grammar (39 MB
parser.c) uses deeply recursive non-terminals that overflow the C stack on files with many statements (e.g. schema dumps >10K lines, stored procedures). The stack overflow kills the thread before any timeout callback can fire, so a pre-parse guard is the only safe mitigation.Changes
CBM_SQL_MAX_LINES = 5000) incbm_extract_file()that returnshas_errorfor SQL files exceeding the thresholdWhy line-count, not byte-size?
CREATE TABLElines overflows the default 512KB macOS thread stackWhy 5000 lines?
Why in
cbm_extract_file(), not the discover layer?Testing
has_error=trueFixes #691
Related: #668 (original report)