Skip to content

Feature request regarding compression options. #322

@throwaway-60789

Description

@throwaway-60789
  1. Similar to xz, zstd allows tuning custom compression parameters (see the "Advanced compression options" section in zstd manual) which can compress the file better at the cost of compression time and/or memory usage:
0 ~:time zstd --single-thread --ultra -22 -o usr.22.tar.zst usr.tar 
usr.tar              : 23.95%   (  1.74 GiB =>    427 MiB, usr.22.tar.zst)     

real	13m13.600s
user	13m13.540s
sys	0m1.286s
0 ~:time zstd --single-thread -d -o out.tar usr.22.tar.zst 
usr.22.tar.zst      : 1871155200 bytes                                         

real	0m1.777s
user	0m1.767s
sys	0m0.417s
0 ~:time zstd --single-thread --ultra -22 --long=31 --zstd=strat=9,wlog=31,hlog=30,clog=30,slog=30,mml=3,tlen=128KiB,lhlog=30,lblog=8 -o usr.cust1.tar.zst usr.tar
usr.tar              : 23.05%   (  1.74 GiB =>    411 MiB, usr.cust1.tar.zst)   

real	22m44.702s
user	22m43.275s
sys	0m2.534s
0 ~:time zstd --single-thread --long=31 -d -o out1.tar usr.cust1.tar.zst
usr.cust1.tar.zst    : 1871155200 bytes                                         

real	0m1.932s
user	0m1.780s
sys	0m0.575s

Note that the compressed file is smaller and the decompression time does not significantly increase. Since most of the time you compress only once in the background and decompress many times in the foreground, the increase in compression time can be justified.

0 ~:time zstd --single-thread -22 --ultra --long=31 --zstd=strat=9,wlog=31,hlog=30 -o usr.cust2.tar.zst usr.tar
usr.tar              : 23.19%   (  1.74 GiB =>    414 MiB, usr.cust2.tar.zst)   

real	12m48.598s
user	12m47.975s
sys	0m1.871s
0 ~:time zstd --single-thread --long=31 -d -o out2.tar usr.cust2.tar.zst
usr.cust2.tar.zst    : 1871155200 bytes                                         

real	0m1.883s
user	0m1.747s
sys	0m0.571s

With options slowing down compression removed, the speed is even slightly faster than the default (at the cost of increased memory usage) and the result file is still smaller.

  1. For ZSTD and LZMA compression, use single-threaded mode (--single-thread in ZSTD, -T 1 in LZMA) to compress each block, as mkdwarfs already compresses multiple blocks at once, using multi-threaded mode wastes memory without improving the compression. (I noticed this when I tried to recompress an already deduplicated filesystem; the total memory usage is significantly higher than running xz -T 1 with the same parameters in parallel.)

  2. Allows skipping the dwarfsck phase when recompressing a filesystem. With multi-terabyte filesystems, the checking will last hours with constant disk reads even when the original filesystem being "known good". (I do it in 2 steps; first deduplicate using all the memory for higher lookback blocks with no compression, then recompress the filesystem.)

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions