I had a side project the last two weeks: Make the build service more fun to use.
No matter how much fun you have creating packages, if they don’t build, there is little point in using a Service that has Build in its name, no? So one of the major goals of the service is actually to help those that want to build packages as good as possible. But there is a problem:
Let me quote from the landing page of build.opensuse.org: “The openSUSE Build Service hosts 16,414 projects, with 107,691 packages, in 26,259 repositories and is used by 25,967 confirmed users.”. That are quite some high numbers – especially in the relation to the ~25 servers we have for actually building.
If you look at the build statistics of the last month (and this is just i586, x86_64 has around the same), you notice that there is not much purple in the “Busy workers / Idle workers” graphic:
Every 2nd weekend or so we have some pause where the servers actually idle around, the rest of the time they are usually under full load and it’s not exceptional that we have over 20,000 build jobs for said 25 servers at the same time. So if Sue comes wants to build her packages at that time, she competes with quite some other packages and she gets frustrated to still see “scheduled” when she goes away. So we use some algorithms in the so called dispatcher to distribute the power to the right packages.
Over years of its existence the dispatcher used the algorithm I would dub “Randomness with exceptions” – it would if the job’s filename matches a regexp and if so, preferred it, otherwise picked a random job. Such algorithms create some fairness if you have 28.000 users all active at the same time, because there is usually not a really good balance between those.
But with 2.0 this changed: we got load and priorities. A script of mine parses the logs of download.opensuse.org and counts how many users are for the repositories. From that I calculate priorities, so that repositories of interest to people get more build power than others. E.g. KDE:Release:45 for 11.3 is downloaded 65 more often than for 11.1, so the 11.3 packages should see more attention. For that the build service calculates how many workers were busy for the repositories and then allows a factor to lower that load while picking the next repository to build for. This is much more complex than “pick a random one”, but it lead to faster return times for those projects that see actual downloads (and new projects as they have no load registered). To give some fun for those actually working on their packages, we also lower the registered load if we see commits.
But there was one problem left: with so many projects registered you also have quite some that aren’t interesting at all. They are not downloaded and very often not even their maintainers care, e.g. for some testing subproject they created in 2009 and then forgot about it. But those repositories build against the often changing openSUSE:Factory and see rebuilds because of that. Those repositories had a very low load because they have few packages, so they are often preferred over projects that have a lot of packages.
To free up more power to recent changes, we now experimented with various ways. It turned out to be useful to look at the relation between since last source change in the project and time since the jobs are “scheduled”. From this we calculate a staleness penalty – when the job is freshly scheduled, it’s basically one chance for a worker for 2 months since the last commit to the project. But this chance rises quickly, the penalty gets smaller the longer the job is in scheduled. Even those projects that have no source commits are valid “customers” and deserve to be rebuild against the latest gcc from openSUSE:Factory.
So what does this mean to you as user of the OpenSUSE Build Service?
- Don’t add repositories unless you really plan to use it. I know that clicking the checkbox to also add all SLE versions is easily done, but remember this is build power and disk space on various mirrors you’ll be using
- If you really care for your project, it’s a good idea to fix build failures from time to time. You’ll get more build power in return
- If you plan to do a larger update of some package in your project and want to test the resulting packages building against it, it’s a good idea to disable all other repositories while you do so. The fewer build jobs you create, the lower is your load, the higher are your chances to get more build power
- Make your repository popular by telling the world about it. More users means more build power
- And last but not least: don’t get frustrated, remember there are almost 26000 other users