openSUSE Lizards

Why do we Release openSUSE on Thursdays – or why do we Slip?

October 22nd, 2008 by Andreas Jaeger

openSUSE 11.1 Beta3 is a bit later than expected (it should go out later today). Of course, this raised couple of questions why. So let me explain how a build of a Beta release works in general from release manager perspective and what are the reasons for the slip.

The usual process we use can be summarized as follows:

Friday afternoon our build team (it’s called actually “autobuild team” – autobuild standing for automatic builds and I’ll use that phrase here) reviews and checks in all the packages in the review queue.
Note: Since our autobuild team is located in Germany, all times are Nuernberg local time: Friday afternoon translates to Friday 15:00 CET/CEST (depending on season) and Monday evening to Monday 18:00 CET/CEST.
Over the weekend the submitted, reviewed and checked-in packages are building – and also packages using these.
So, monday morning a first test build is created to assess the state of the build. This is just a basic installation test to see that the installation work flow is ok and that the packages itself install.
If we encounter any bugs during this, these are reported via bugzilla.
Monday evening we check in fixes for the problems found during the weekend, e.g. packages not building, and by the test installation. Unless we have a serious problem in the base system, the autobuild team only checks in packages that trigger a few packages. We have 4000 packages in our factory distribution and try to trigger in total on Mondays less than 2000 (less if possible). Besides the critical fixes, we also take in packages with bug fixes that trigger a rebuild of only a few packages. For example a leaf package – one that is not required by any other package – will always be reviewed and checked in on Monday.
We limit the amount of rebuilds on Monday so that the autobuild team can create on on Tuesday morning the next build images. These images contain changes from the morning before and therefore the release manager and some QA engineers do a first test installation on them. If the first test is ok, we start the full QA pre-release test runs.
If the first test is not ok and we encounter blockers (bugs that block our testers) or ship stoppers (bugs that are so bad that no user can use the system) bugs, these get filed in bugzilla and the engineers work on fixing them with highest priority. If all known bugs are fixed, we create a new build and start again testing – until the
quality of the build is good enough.
In general the Tuesday build will find two or three bugs that get fixed rather quickly and we have later on Tuesday fixes and also a new build that then passes goes to QA and passes the pre-release tests.
After the pre-release testing has passed, we release the media internally for wider testing. If nobody notices any real obstacles, the release process continues, the medias are uploaded to our staging server and find their ways to the mirrors – and we then release the Beta on Thursday for public testing.

During the pre-release testing and the internal testing, many bugs are found. We do not fix all of them directly – instead they are reported via bugzilla and also noted as “Most annoying bugs” so that other testers know about them. Only if we hit really blockers or ship stoppers, we will fix them and then might not even fix the “annoying” ones to not introduce further bugs. If we would fix every bug we find internally, we would never release ;). We’re really concentrating on fixing bugs that block further development or testing on a majority of machines (or for a majority of users).

Stabilizing the Distribution Build

The first beta – or often the first two betas – do have more integration issues since everybody is finishing his project and getting it in. We do continuous integration with the Alphas and try to get critical stuff in early but some development work just finishes with Beta1 and then causes trouble.

Together with the switch from our internal autobuild system to the new openSUSE build service, it took us – so far – three betas to get everything smooth again. I hope the worst is over now – and we do not loose again power for so long at a critical time.

Building in the openSUSE Build Service

Since openSUSE beta 1 we’re using the openSUSE build service to build the openSUSE distribution and the images – and not anymore the internal legacy autobuild system. This brings us even some speedups compared to our old autobuild system. We still have a few selected issues that only a live test of this scale can find. Also, for installation of the distribution, we develop changes, e.g. to our metadata, that we like to do in the build service to make all our lives easier. This kind of work in the build system during the first betas is normal in general, this time the challenge is that we want to do more with the new technology ;).

In the past the images were built by scripts that were invoked by the autobuild team. The openSUSE build service is now getting enhanced so that it builds images automatically whenever all required packages for an image are ready. This will allow us to check in packages and get back later automatically the images without any further user input.

Build Numbers

Each build is uniquely identified by a build number and this one is also used in bugzilla, e.g. we’re looking at build number 76 for Beta3 (no, this is not the 76th build for that beta, we start with 1 at some point in the pre-beta phase and increase it so that build 76 is unique over the complete build cycle) – and once Build 76 is declared as Beta 3, we only speak about Beta 3.

openSUSE 11.1 Beta3 Delay

So, where does this leave us with Beta2 and Beta3? Why were those not in time?

A couple of things hit us especially for these betas, here’s an unsorted list of issues:

After the power outage, one build host came up in a broken state and never finished a build and thus blocking the rest.
Due to the power outage, we were only able to do the full checkin of packages for Beta3 on Monday – and a full rebuild takes roughly 48 hours. All in all the power outage of more than a day cost us four important days.
Some planned changes of metadata – which needed changes in both YaST installer and build system -, were buggy and therefore we needed to fix them.
Some packages did not build at all but were crucial to building, we needed fixed packages and therefore had to restart the distribution build.
Some packages broke other packages and we had to fix these.
Some bugs take longer to fix then a day…
The building of the whole distribution showed some errors in packages (dependency loops) which lead to a longer build time.

All in all, for beta 3 we needed a couple more test builds than usual until we had a build that could be given to QA for pre-release testing.

Why Thursday?

If you do the maths, you see that if everything works out, we have on tuesday a working build and could release it wednesday morning. Experience shows that we need the extra day since we’re not that perfect… We really want to make use of the weekend for building and therefore you end on a thursday as “openSUSE release day”. We could add more buffer in and release later – but our goal is to get it out to testers as soon as possible. With the release candidates, we do not have a Monday checkin deadline and will fix more bugs before the actual release.

Why Rebuild all Packages?

We trigger for rebuild all packages that depend on a package that has been rebuilt – and do this recursively until no more packages have to be rebuilt. This means that if e.g. a new GCC compiler package is checked in all packages in our distribution are rebuilt.

The advantage of this is that we know that packages work together since we run the testsuite of each package during package build. If one package changes the API or ABI, the rebuild of the packages will notice many problems. We also find bugs in the packages when they run the testsuite – this allows us to do some basic testing of packages already at build time and have a distribution that in itself should be consistent.

Both comments and pings are currently closed.

Tags:No tags available
Category: Distribution · Packaging

Posted: 2008-10-22 - 05:12
Author: Andreas Jaeger
Feed: RSS 2.0

12 Responses to “Why do we Release openSUSE on Thursdays – or why do we Slip?”

Daniel

October 22, 2008 at 06:30 |

Very interesting post, thanks!
Fernando Maior

October 23, 2008 at 03:39 |

Andreas, I agree with Daniel. And I add to it that some people
that complained about the delaying and other things should be
reading it, and realizing that working in OpenSUSE is fun, but
it is hard work too!

Many thanks for the insight.
Fernando
Stano

October 23, 2008 at 12:26 |

Thanks a lot for the post!
Nick

October 23, 2008 at 18:28 |

Better later and in a better shape than rushed out through the door and buggy as … other oS-es 🙂 Can’t do much without power…
Bertbeau

October 23, 2008 at 18:55 |

I am a great fan of OpenSuse since 8.
I also have many of my customers usint either 10.3 or 11
We love it for its solidity, flexibility in all sorts of installation fron 300mhz machine to 64 bits ones etc…

Keep up the good work!
rod

October 23, 2008 at 19:05 |

OK, in December i want to see the best linux desktop of World!!!
Gerald Pfeifer

October 23, 2008 at 21:15 |

> If one package changes the API or ABI, the rebuild of the
> packages will notice many problems.

Doesn’t rebuilding all packages actually _hide_ ABI changes?
- Andreas Jaeger
  
  October 24, 2008 at 08:41 |
  
  > Doesn’t rebuilding all packages actually _hide_ ABI changes?
  
  The rebuild will notice the API change at compile time – and not an obscure runtime error. I don’t understand your reference to ABI changes here.
Cinq-Marquis

October 27, 2008 at 15:17 |

Thanks for the clear ‘explain’ 😉
Have been using (open)SuSE since 2001 and appreaciate all the work that you and your team do !
Thanks agian.

__________________________
openSuSE 11.0 – KDE 4.1.64
F2

October 29, 2008 at 00:43 |

I dig this! 🙂
felipe alvarez

November 2, 2008 at 09:41 |

I like reading things like these. I am not a developer, but I do like testing Late Betas and RC’s. This was an interesting educational lesson. Thanks for sharing. Also sounds very very technical — means I have even more learning to do.
OO user whit 1440x900 resolution

November 2, 2008 at 12:40 |

Interesting and cood to know information how this kind of prosess works. I tryed out Kubuntu 8.10 release and I new get it work on my two pc’s (Intel and AMD). I took OpenSuse 11.1 Beta4 and both PC runs like sharm. Only OpenOffice wont work due to fatal Error. I just don’t know where the error comes from. But anyhow, you can see that Suse has very good prosess to minimize fatal bugs and thats why Suse’s betas works much better than other Linux distro’s final releases! Bigest issue is the hardware combability and autoreconition of HW. Suse has still problems to get my LG 19LS4D-ZB monitor to work correctly on 1440×900 resolution. Now I have to use “LCD” monitor and 1400v1050 resolution to get it work. This seems to be issue on all distros.

Anyhow – keep working as hard you have – this will be a nice release someday!