-
Notifications
You must be signed in to change notification settings - Fork 83
Description
- Similar to
xz,zstdallows tuning custom compression parameters (see the "Advanced compression options" section inzstdmanual) which can compress the file better at the cost of compression time and/or memory usage:
0 ~:time zstd --single-thread --ultra -22 -o usr.22.tar.zst usr.tar
usr.tar : 23.95% ( 1.74 GiB => 427 MiB, usr.22.tar.zst)
real 13m13.600s
user 13m13.540s
sys 0m1.286s
0 ~:time zstd --single-thread -d -o out.tar usr.22.tar.zst
usr.22.tar.zst : 1871155200 bytes
real 0m1.777s
user 0m1.767s
sys 0m0.417s
0 ~:time zstd --single-thread --ultra -22 --long=31 --zstd=strat=9,wlog=31,hlog=30,clog=30,slog=30,mml=3,tlen=128KiB,lhlog=30,lblog=8 -o usr.cust1.tar.zst usr.tar
usr.tar : 23.05% ( 1.74 GiB => 411 MiB, usr.cust1.tar.zst)
real 22m44.702s
user 22m43.275s
sys 0m2.534s
0 ~:time zstd --single-thread --long=31 -d -o out1.tar usr.cust1.tar.zst
usr.cust1.tar.zst : 1871155200 bytes
real 0m1.932s
user 0m1.780s
sys 0m0.575s
Note that the compressed file is smaller and the decompression time does not significantly increase. Since most of the time you compress only once in the background and decompress many times in the foreground, the increase in compression time can be justified.
0 ~:time zstd --single-thread -22 --ultra --long=31 --zstd=strat=9,wlog=31,hlog=30 -o usr.cust2.tar.zst usr.tar
usr.tar : 23.19% ( 1.74 GiB => 414 MiB, usr.cust2.tar.zst)
real 12m48.598s
user 12m47.975s
sys 0m1.871s
0 ~:time zstd --single-thread --long=31 -d -o out2.tar usr.cust2.tar.zst
usr.cust2.tar.zst : 1871155200 bytes
real 0m1.883s
user 0m1.747s
sys 0m0.571s
With options slowing down compression removed, the speed is even slightly faster than the default (at the cost of increased memory usage) and the result file is still smaller.
-
For ZSTD and LZMA compression, use single-threaded mode (
--single-threadin ZSTD,-T 1in LZMA) to compress each block, asmkdwarfsalready compresses multiple blocks at once, using multi-threaded mode wastes memory without improving the compression. (I noticed this when I tried to recompress an already deduplicated filesystem; the total memory usage is significantly higher than runningxz -T 1with the same parameters in parallel.) -
Allows skipping the
dwarfsckphase when recompressing a filesystem. With multi-terabyte filesystems, the checking will last hours with constant disk reads even when the original filesystem being "known good". (I do it in 2 steps; first deduplicate using all the memory for higher lookback blocks with no compression, then recompress the filesystem.)