-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Streaming filter decompression #3429
Description
Explanation
I've recently played around with stacked filters for streams and I've realized that you can build enormous zip bombs by nesting FlateDecode filters like in this file: bomb.pdf
The file isn't actually a valid PDF file and most PDF parsers I tested recognize that without decompressing the whole content first. pypdf however tries to decompress the whole stream before parsing it. This normally works but the file I provided unpacks to over 1PB of zero bytes.
A fix for this would be to stream decompression as you can process decompressed data from zlib before you've finished decompressing the whole thing.
I'm pretty sure that this would require a significant amount of changes to the decompression logic but this could also be seen as a security flaw as parsing a small untrusted PDF file could lead to a DOS.
I'm not really sure what policy you have for these kinds of issues but I wanted to report it in case someone might want to fix it.