I start with history. After 11.0 I became maintainer of perl-Bootloader (I never before write anything in perl, but know some other scripting language, so it is not so hard learn another one) after Alexander. Problem is that alexander doesn’t have enough time for maintainer it (he is also leader of arch team). This mean I get many unresolved bugs (around 150), because lack of resource prevents fixing it. Also I get some features to implement and some enhancement I found enough useful (some idea start in bug reports or on factory mailing list, so thanks community) to implement it. I describe what succeed and what not in rest of this blog entry.
Features can be divide to official which come through fate process ( you can read more about this process in another blog) and enhancement, which sounds reasonable for me.
Official features
First I look on official features. I think most important is automatic test suite. This is set of automatic test, which test again interface of library (it is black box testing). Before release of 11.1 this test suite contain 232 tests. I hope this increase quality of each perl-Bootloader release, because this suite catch many problems and also if new occur I add to it (this should prevent regression). Only problem with this test suite is that it doesn’t test whole kernel upgrade. It is hard to due, because kernel upgrade must fill itself hardware informations and if I want test it correctly I must simulate it with actual utilities which is used and on many types of hardware. So this part should be improved, but need some idea how test many hardware configuration (like different RAIDs (linux and bios), LVMs, multipath, different hardware architecture (like macos, efika or chrp on powerpc)) on one machine.
Another feature is consistent device names. This is most problematic feature, because it is hard to resolve many different devices given by udev (also udev is broken during some part of development). I experiment with many different sollution and at final I decide (it is after last beta) to create function which translate everything to kernel device and then compare that device to another translated device (previous solution based on filling all symlinks also work, but after device mapper problem with udev(more lower) I change it to more efficient and more reliable solution).
Next feature you can use if you have machine with chip to trusted computing (some notebooks have it). More about trusted computing you can read here. Most work did thorsten duwe (maintainer of grub) and in final I only ensure that due to security reason no splash screen is loaded (I remove message line even if user want it). If you have pc with GPT table and x86_64 processor you must for 11.0 use legacy booting. Now you can use ELILO bootloader (which have support efi) also on x86 architecture. This is also quite easy to implement, but harder to ensure it works, because get hardware with that configuration is not so easy. Only short notes for another features. Old disks which uses C/H/S is dropped and LBA is forced, kernel append during upgrade is taken from sysconfig and add support for disk remapping for windows entries.
Enhancement
Difference between feature and bug is really small and maybe some of enhancement looks more like bug and vice versa. For me interesting enhancement is check, if /boot is mounted. I use it at home quite often, because if you have separate home you needn’t mount it, but if during update also kernel is updated I often forgot mount it. First implementation is not ideal and I get many experience, that some users have really exotic entries in fstab, which of course bad match to check pattern. But after three iteration and one work-around this work quite good and I am satisfied with it. Another enhancement (really close to bug) is none bootloader. It is special bootloader settings, which ensure that nothing happen during kernel upgrade. It is quite useful if kernel update mess your configuration (which is bug, but you want prevent it until I fixed it) or take to much time (this is also bug). More important usage is for netboot, when you needn’t any bootloader, because you boot via PXE.
Important enhanced is stop kernel install, if bootloader update fail (problem is in using tee in pipe and ignore script return value). It is good, because you know that something goes wrong and reload backup. Also because usually new kernel is installed first, you still have old kernel and can boot it. Last enhancement is for anyone who want look how perl-bootloader works inside. Now you can use make doc in sources and it generates html page from comments (something like javadoc). Beside this documentation together with Jozo Uhliarik we create wiki page about interface of library and anyone who want use perl-Bootloader should use it.
Bugs
Solving bugs in perl-Bootloader is not so interesting as it looks like ;). You need analyze many long logs (if perl-Bootloader uses yast, then it is saved to y2log, if used by kernel upgrade then save to perl-BL-standalone log) and find what and when going wrong. You find what is wrong, but not who break it. Sometime it is bug in parsing configuration, sometime in parsing output of external commands and sometime is bug outside (like break udev). Source code (also with test suite) have 15k LOC and when I try look how many line I change (oneliner for it `find . -type f 2>/dev/null | grep -v .svn | xargs svn blame | grep jreidinger | wc -l `) I find, that I change 3k LOC. So code base is good, but need some improvement. I note some interesting bugs, which I solve.
One is longterm problem with chainload, when as root is current mounted root, but chainloader key is with right prefix (root of chainloaded partition). This on some hardware causes problems and now root is correctly set and I don’t have any negative responses (so I hope it works).
Another one is also quite long term problem when you change flavor of kernel, sometime previous kernel is uninstalled and after that new installed (normal work-flow is install new and uninstall old). Of course it switch default, which is quite annoying. Fix this is quite tricky, and trick is that after remove last image section and if it is default, I add comment about that and when new kernel is added, then set it as default. That fix works quite good, but after sent iso to factory (for boxes) I get bugs, that if you update via YaST, it doesn’t work. Problem is in some deprecated code which overwrite my comment by another one. Fix is easy, I only remove deprecated code, which cause it. So if you after install of opensuse11.1 see update of perl-Bootloader you know why. As late update is also solved problem with hanging kernel update. It take too much time because when I read logs I add useful logging lines to code (I hope it help me next time find problem faster) and when log records reach some level, it take too much time (due to array copying of records). I reduce debug logging and also improve whole performance of logging, so I hope it significant improve time to kernel update (at least it should not take minutes to upgrade bootloader configuration).
Not every bug is problem of perl-Bootloader, but you must solve it in code. Example is device mapper and udev inconsistency. Udev points to /dev/dm0, but device mapper to /dev/mapper/something. This became real problem when persistant names feature is implemented. Solution is also little tricky, udev define variables for links and one of that variable is DM_NAME for name of device and DM_PART for partition number and with this information I can construct whole /dev/mapper/name. I find this solution as working, but not much robust. If someone know better, (s-)he is welcome to write it to comment. Also many perl warnings is fixed, it usually doesn’t break anything, but it looks really unprofessional.
Now I have only 2 unresolved bugs (both is enhancement on which I work) and wait for another, so report all problems in opensuse11.1, I don’t throw it to bin 😉
Both comments and pings are currently closed.
Wow! There is so much interesting detail in this post! You have actually restored my trust in the bootloader code.
You’ve made great progress with the bootloader – thanks for sharing this!
Also a big thank-you from me. Even if we had a heated debate about serial consoles and about “esoteric mount points in /boot” 😉 I really appreciate your and Jozefs work on the bootloader code! Especially the test suite and the improved documentation is clearly a step in the right direction in terms of maintainability of the code base.
Those debates lead to better code and thats really important 😉
Very intersting Jozef, I commented in factory list on the XFS boot thread, about some historical things.
Something I don’t understand, is why you try to look in fstab(5) to see if /boot is mounted, when you could simply access a directly to see if it exists /boot/grub say. If it’s not there, then you know the kernel rpm update will fail. That’s only 1 simple system call to.
When I filed a bug on old 10.3 kernel update, it was the reporting back to user angle that seemed like it could be tricky, as presumably the magic would be contained within the rpm’s.
There are few reasons. First is that this check is more high-level, where I don’t know what bootloader I have (and this is problem, because for elilo also doesn’t exist /boot/grub). Checking if exist some kernel file is present is also not success, as during install there is nothing. What I must found is if user have separated boot, because that partition mask underlaid /boot. Checking fstab is for me the best way, because if you have separated boot which you want use, fstab is natural way to mark it and use only mount /mnt to attach it. And also installer create this entry (no if you don’t want automount you simple add noauto. Of course if you have more ideas how detect separated boot, which is not mounted, I welcome it.