When Coolo looked into how to get rid of (Another) UnionFS for Live CDs and came up with the DoenerFS (now clicfs) idea, I remembered that my friend Arnd has workded on fake write support for cramfs. So I took his code and ported it to SquashFS to see how that goes. My expectation was that it might be faster than Coolo’s clicfs using FUSE. Here are some results using openSUSE-KDE4-LiveCD-i586-i686-Build0098 booting into runlevel 3:
- clicfs: 637MB ISO Image booting in 1:28 min (0:24 min from RAM)
- squashfs-rw: 751M ISO Image booting in 1:50 min (0:28 min from RAM)
The difference in the sizes of the ISO images are due to the fact that clicfs is using LZMA compression while SquashFS is still using the in-kernel GZIP implementation. Surprisingly the clicfs image isn’t only smaller but is also faster booting on real media and from RAM (using KVM). So even if we ignore the fact that clicfs is optimized for limiting the number of seeks on disk the SquashFS implementation is still slower. It would be interesting to see if it is just the LZMA compression that is making the difference or something completely different.
The patches for the SquashFS fake write support are here: http://git.infradead.org/users/jblunck/linux-2.6.git?a=shortlog;h=refs/heads/squashfs-rw-v1.
Both comments and pings are currently closed.
I once did some pretty exaustive tests with cramfs and squashfs in terms of space efficiency. It turned out that lzma compression in squashfs did not gain a single byte of extra space when using the same block size. The default block size for squashfs with lzma was significantly larger than for gz (128kb vs. 4kb or something). What gave squashfs the biggest advantage against cramfs was actually just the tail merging, all the other tricks it plays trying to reduce the size were close to zero.
Maybe you should just try squashfs-rw with gz compression but a larger block size.
The default blocksize used by squashfs-tools 4.0 is 128k. So this doesn’t seem to help that much.
A couple of points:
Clicfs vs Squashfs:
Time taken to boot is a function of the amount of I/O and the amount of seeking. If clicfs was both significantly smaller through using lzma and optimised for seeking (I assume by ordering the files accessed at start-up together on disk), then it is hardly surprising that it was faster. Use lzma in Squashfs, and use the -sort option to optimise the layout of files on disk, for a more meaningful benchmark.
Cramfs vs Squashfs:
Squashfs uses two techniques to get better data compression that cramfs, larger blocks, and tail end packing. The max datablock size in cramfs is 4K, the max blocksize in Squashfs is 1M. If you used the same block size in cramfs and squashfs, then it isn’t surprising that tail end packing was what gave better compression.
Gzip vs lzma:
1. LZMA is effectively gzip with a large window size (1 Mbyte). To get better compression with lzma you have to use large blocks (128K or larger).
2. The gzip window size is 32K. Compressing in blocks larger than 32K with gzip gains you very little.
So using 32K or smaller blocks in Squashfs, replacing gzip with lzma won’t make much difference. Using 128K blocks or larger lzma should make a difference. The Slax liveCD author reported that lzma with 128K blocks produced significantly better compression than gzip.
The default block size in Squashfs is 128K irrespective of what compression scheme is being used.
I wasn’t aware of the -sort option. Although this makes some adjustments necessary to the way we generate the ISO I’ll probably give it a try next week.
I adapted the mksquashfs tool to use LZMA instead of GZIP today. This reduces the size of the ISO image with SquashFS to 664M. I used a preset of 2 instead of Coolo who use 6 in clicfs. I’ll post the patches later … have to get the kernel part right first.
There’s patches on squashfs-lzma.org. They’re IMO the best lzma patches available for Squashfs. They’re unfortunately not available for Squashfs 4.0, but could be adapted with a bit of work, if anyone was so inclined. Unfortunately even updated to Squashfs 4.0, they’ll never be suitable for mainline because they use their own lzma implementation. A mainlinable lzma implementation has to be done differently, which is probably why the author of the patches says there’s no longer any interest in developing separate patches against Squashfs 4.0.