Stephan Kulow, Michael Matz and others have been working on reducing the size of updates of factory (see feature #303532), so that less packages need to be downloaded each time and after Gerald pointed out two problems, I talked a bit with Stephan today about the current state.
The openSUSE Build Service rebuilds all dependend packages of a freshly checked in package. This means that if e.g. the gcc compiler is checked in, all packages get rebuild. Many packages do not have any change after such a rebuild but in the past we nevertheless published these packages and if you followed the factory distribution, you had to download and install the package again and again every time. Additonally, this means we reduce the number of packages we sync out to our mirrors. Overall the less data we move around, the faster we can move it around.
Gerald noticed that two large packages still got downloaded last time: texlive and xorg-x11-fonts. He asked whether those were really changed.
Looking at texlive, I noticed that /etc/texmf/ls-R contained a temporary file and that has a different name each time. The mktexlsr script should have filtered the temporary file out from the list but due to a patch that the openSUSE tex maintainer added, this did not work – so fixing the patch removes this problem and I hope we don’t encounter further ones.
With xorg-x11-fonts the problem was different, the build-compare script – which does the comparison of previous and current build to throw away equal packages -, gave an output but no details. Stephan looked into this and noticed that the support for gzipped files was broken and implemented also support for jar and zip files. This means that now some more packages can be handled properly and get filtered out.
So, what does the build-compare script do? Itbasically checks for each package that besides timestamps,build numbers and temporary filenames there’s no difference in each of the files. For example, since gcc puts the temporary filename and a timestamp into each object file, the compare needs to filter this information out.
Let’s look at one successfull example as well:
Package aspell-dictonaries consisting of e.g. aspell-de has been build several times but if you look at the live build log (note that you need to login into the openSUSE Build Service first for the link to work), you see at the end:
Retried build at Thu Jun 18 05:03:57 2009 returned same result, skipped
Retried build at Sat Jun 20 09:51:32 2009 returned same result, skipped
Retried build at Mon Jun 29 22:05:44 2009 returned same result, skipped
Retried build at Sat Jul 4 00:56:27 2009 returned same result, skipped
Retried build at Sat Jul 11 07:13:37 2009 returned same result, skipped
And if you look into the oss factory repository, you see that the file aspell-de-0.60.20030222.1-8.1.x86_64.rpm was last changed at “17-Jun-2009 15:54”. The file has a size of 3.6 MB, so everybody that updated factory and has this file installed did not need to download since then 5 versions of the file that were rebuild – but did not contain any changes at all.
With the first version of the script, Stephan checked that out of 7345 binary rpms (including all subpackages of openSUSE 11.1, 5666 are considered equal. So, the script filtered out 77 per cent of packages and would therefore publish only 23 per cent of new packages.
If you notice that packages get published but are convinced there are no changes, it could be that the package uses temporary data or timestamps somehow and we do not filter those out – or the build-compare script does not handle the package properly. In any case, patches for either of these (prefered) or bug reports are very welcome. I suggest to concentrate on large packages first. texlive is 266 MB, so every fix for this package will help a lot. Btw. xorg-x11-fonts is 24 MB.
Both comments and pings are currently closed.
I’d propose creating delta packages for the factory repo:)??
There would be even less to download 🙂
Yes, that’s discussed in the feature as well but not done yet. It reduces download but AFAIK it increases installation time. Depending on the size of your download pipe and the delta, it might be a significant improvement.
I can’t comment for the HackWeek, please recheck your spam protection !!
I’ll put it here :
Very nice news.
this a very good idea to let people do what they want during a week.
But it needs more info about the event inside SUSE labs to see how people think and work.
launching a website specially for hackweeks where you post photos and some videos, looks for me a good idea.