openSUSE 11.1 Beta3 is a bit later than expected (it should go out later today). Of course, this raised couple of questions why. So let me explain how a build of a Beta release works in general from release manager perspective and what are the reasons for the slip.
The usual process we use can be summarized as follows:
- Friday afternoon our build team (it’s called actually “autobuild team” – autobuild standing for automatic builds and I’ll use that phrase here) reviews and checks in all the packages in the review queue.
Note: Since our autobuild team is located in Germany, all times are Nuernberg local time: Friday afternoon translates to Friday 15:00 CET/CEST (depending on season) and Monday evening to Monday 18:00 CET/CEST.
- Over the weekend the submitted, reviewed and checked-in packages are building – and also packages using these.
- So, monday morning a first test build is created to assess the state of the build. This is just a basic installation test to see that the installation work flow is ok and that the packages itself install.
- If we encounter any bugs during this, these are reported via bugzilla.
- Monday evening we check in fixes for the problems found during the weekend, e.g. packages not building, and by the test installation. Unless we have a serious problem in the base system, the autobuild team only checks in packages that trigger a few packages. We have 4000 packages in our factory distribution and try to trigger in total on Mondays less than 2000 (less if possible). Besides the critical fixes, we also take in packages with bug fixes that trigger a rebuild of only a few packages. For example a leaf package – one that is not required by any other package – will always be reviewed and checked in on Monday.
- We limit the amount of rebuilds on Monday so that the autobuild team can create on on Tuesday morning the next build images. These images contain changes from the morning before and therefore the release manager and some QA engineers do a first test installation on them. If the first test is ok, we start the full QA pre-release test runs.
- If the first test is not ok and we encounter blockers (bugs that block our testers) or ship stoppers (bugs that are so bad that no user can use the system) bugs, these get filed in bugzilla and the engineers work on fixing them with highest priority. If all known bugs are fixed, we create a new build and start again testing – until the
quality of the build is good enough.
- In general the Tuesday build will find two or three bugs that get fixed rather quickly and we have later on Tuesday fixes and also a new build that then passes goes to QA and passes the pre-release tests.
- After the pre-release testing has passed, we release the media internally for wider testing. If nobody notices any real obstacles, the release process continues, the medias are uploaded to our staging server and find their ways to the mirrors – and we then release the Beta on Thursday for public testing.
During the pre-release testing and the internal testing, many bugs are found. We do not fix all of them directly – instead they are reported via bugzilla and also noted as “Most annoying bugs” so that other testers know about them. Only if we hit really blockers or ship stoppers, we will fix them and then might not even fix the “annoying” ones to not introduce further bugs. If we would fix every bug we find internally, we would never release . We’re really concentrating on fixing bugs that block further development or testing on a majority of machines (or for a majority of users).
Stabilizing the Distribution Build
The first beta – or often the first two betas – do have more integration issues since everybody is finishing his project and getting it in. We do continuous integration with the Alphas and try to get critical stuff in early but some development work just finishes with Beta1 and then causes trouble.
Together with the switch from our internal autobuild system to the new openSUSE build service, it took us – so far – three betas to get everything smooth again. I hope the worst is over now – and we do not loose again power for so long at a critical time.
Building in the openSUSE Build Service
Since openSUSE beta 1 we’re using the openSUSE build service to build the openSUSE distribution and the images – and not anymore the internal legacy autobuild system. This brings us even some speedups compared to our old autobuild system. We still have a few selected issues that only a live test of this scale can find. Also, for installation of the distribution, we develop changes, e.g. to our metadata, that we like to do in the build service to make all our lives easier. This kind of work in the build system during the first betas is normal in general, this time the challenge is that we want to do more with the new technology .
In the past the images were built by scripts that were invoked by the autobuild team. The openSUSE build service is now getting enhanced so that it builds images automatically whenever all required packages for an image are ready. This will allow us to check in packages and get back later automatically the images without any further user input.
Each build is uniquely identified by a build number and this one is also used in bugzilla, e.g. we’re looking at build number 76 for Beta3 (no, this is not the 76th build for that beta, we start with 1 at some point in the pre-beta phase and increase it so that build 76 is unique over the complete build cycle) – and once Build 76 is declared as Beta 3, we only speak about Beta 3.
openSUSE 11.1 Beta3 Delay
So, where does this leave us with Beta2 and Beta3? Why were those not in time?
A couple of things hit us especially for these betas, here’s an unsorted list of issues:
- After the power outage, one build host came up in a broken state and never finished a build and thus blocking the rest.
- Due to the power outage, we were only able to do the full checkin of packages for Beta3 on Monday – and a full rebuild takes roughly 48 hours. All in all the power outage of more than a day cost us four important days.
- Some planned changes of metadata – which needed changes in both YaST installer and build system -, were buggy and therefore we needed to fix them.
- Some packages did not build at all but were crucial to building, we needed fixed packages and therefore had to restart the distribution build.
- Some packages broke other packages and we had to fix these.
- Some bugs take longer to fix then a day…
- The building of the whole distribution showed some errors in packages (dependency loops) which lead to a longer build time.
All in all, for beta 3 we needed a couple more test builds than usual until we had a build that could be given to QA for pre-release testing.
If you do the maths, you see that if everything works out, we have on tuesday a working build and could release it wednesday morning. Experience shows that we need the extra day since we’re not that perfect… We really want to make use of the weekend for building and therefore you end on a thursday as “openSUSE release day”. We could add more buffer in and release later – but our goal is to get it out to testers as soon as possible. With the release candidates, we do not have a Monday checkin deadline and will fix more bugs before the actual release.
Why Rebuild all Packages?
We trigger for rebuild all packages that depend on a package that has been rebuilt – and do this recursively until no more packages have to be rebuilt. This means that if e.g. a new GCC compiler package is checked in all packages in our distribution are rebuilt.
The advantage of this is that we know that packages work together since we run the testsuite of each package during package build. If one package changes the API or ABI, the rebuild of the packages will notice many problems. We also find bugs in the packages when they run the testsuite – this allows us to do some basic testing of packages already at build time and have a distribution that in itself should be consistent.
Both comments and pings are currently closed.