Quality Assurance – openSUSE Lizards

Debugging jenkins

bmwiedemann — Wed, 31 Jul 2019 08:04:23 +0000

We had strange near-daily outages of our internal busy jenkins for some weeks.

To get to the root cause of the issue, we enabled remote debugging with

-Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=9010 -Dcom.sun.management.jmxremote.ssl=false -Djava.rmi.server.hostname=ci.suse.de -Dcom.sun.management.jmxremote.password.file=/var/lib/jenkins/jmxremote.password

and attached visualvm to see what it was doing.
This showed the number of threads and memory usage in a sawtooth pattern. Every time the garbage collector ran, it dropped 500-1000 threads.

Today we noticed that every time it threw these java.lang.OutOfMemoryError: unable to create new native thread errors, the maximum number of threads was 2018… suspiciously close to 2048. Looking for the same time in journalctl showed
kernel: cgroup: fork rejected by pids controller in /system.slice/jenkins.service

So it was systemd refusing java’s request for a new thread and jenkins not handling that gracefully in all cases.
That was easily avoided with a
TasksMax=8192

Now the new peak was at 4890 live threads and jenkins served all Geekos happily ever after.

Highlights of YaST development sprint 29

Yast Team — Thu, 22 Dec 2016 17:06:17 +0000

It’s Christmas time and since (open)SUSE users have been nice, the YaST team brings some gifts for them. This is the result of the last development sprint of 2016.

As you may have noticed, in the latest sprints we have been focusing more and more in making SUSE CASP possible. That’s even more obvious in this last sprint of the year. For those that have not been following this blog recently, it’s probably worth to remember that SUSE CASP will be a Kubernetes based Container As a Service Platform.

But our daily work goes beyond CASP, so let’s take a look to all the highlights.

More improvements in the management of DHCLIENT_SET_HOSTNAME

In the previous report we presented the changes introduced in yast2-network to make the configuration parameter DHCLIENT_SET_HOSTNAME configurable in a per-interface basis.

One of the great things about working in an agile an iterative way, presenting and evaluating the result every three weeks, is that it allows us to detect room for improvements in our work. In this case we noticed some discrepancy in the expectations of Linuxrc and yast2-network and also some room for improvement in the code documentation and in the help texts.

Thus, we used this sprint to refine the work done in the previous one and tackle those problems down.

Ensure installation of needed packages

Another example of iterative development. We already presented in the report of the 26th development sprint a new mechanism to detect when the user had deselected during installation some package that was previously pre-selected by YaST in order to install the bootloader. Since the new functionality proved to work nicely, we decided to extend it to cover other parts of the system beyond the bootloader.

The software proposal now contains an error message including a list of missing packages or patterns, in case the user deselects some needed items.

After clicking the Install button the installation is blocked, the user must resolve the problem either by selecting the packages back or by adjusting the respective YaST configuration (e.g. do not install any bootloader and disable the firewall).

Rethinking the expert partitioner

May we insist one more time on the topic of using Scrum to organize our work in an iterative way? As our usual readers should already know, we structure the work into minimal units that produce a valuable outcome called PBIs in Scrum jargon. That valuable outcome doesn’t always have to be a piece of software, an implemented feature or a fixed bug. Sometimes a document adds value to YaST, specially if it can be used as base to collaborate with people outside the team.

Our readers also know that we are putting a lot of effort in rewriting the whole storage layer of YaST. That also implies rewriting the most powerful tool known by humanity to define partitions, volumes, RAIDs and similar stuff – the YaST expert partitioner.

It would be great if we could use the opportunity to make it both more powerful and more usable. You can take the first part for granted, but we are not so sure about our UI design skills. That’s why we wanted to have a base to discuss possible changes and alternative approaches with UX (user experience) experts. And we decided that it was worth to invest some time to create a document collecting the state of the art and some ideas for the future and to send that to SUSE experts in UX and to anybody with some interest in the topic.

Here you can find that fine piece of documentation. Take a look to that document if you want to peek into YaST developers’ mind. That’s the kind of stuff we discuss when we are about to start rewriting something… specially something that need to serve hundreds of different use cases.

And of course we would like to know your ideas or thoughts. We usually discuss this stuff at the public #yast IRC channel and at the yast-devel mailing list. But if you prefer so, you can simply open an issue at the repository hosting the document. Whatever works for you.

Rethinking yast2-network

But that was not the only documentation PBI finished during this sprint. Inspired by the first fruits of the storage layer reimplementation, we decided yast2-network also deserves a reincarnation.

As we did in the past with yast2-storage and libstorage, the first step is to collect as much information as possible about what can be currently done with the module and how it behaves in several situations, specially in tricky or complex scenarios. The outcome was three documents, one about the behavior during installation (installation.md), a second one about AutoYaST (autoinstallation.md) and another collecting general features (features.md).

CASP: merged dialogs for root password and keyboard layout

CASP is a product targeted to a quite specific use case with simplicity as a main priority. The installation process has been streamlined to a minimal set of dialogs to configure just the very basic stuff. Among other removed things, there is no step to configure the system language. That can be a problem when entering the root password (the only user that will be created during installation), since the language settings screen is normally also used to select the keyboard layout.

The implemented solution is shown in the screenshot below. As you can see, the keyboard layout and root passwords selections are merged into a single step. As a bonus, we made both widgets more reusable, opening the possibility to place the root password widget or the keyboard layout selection anywhere.

Storage reimplementation: handling GPT disks in the installation proposal

After several sprints reporting small steps forward, in the 27th sprint we were happy to announce that our testing ISO for the new storage stack was fully installable under certain circumstances. As we reported, it worked in UEFI or legacy systems with the only requirement of having a pre-existing MBR partition table in the disk.

Now we can say it also works with GPT partition tables and even with systems with a mixture of both technologies.

Making the GPT scenario work was much harder that it sounds due to several factors, like the strange way in which parted handles partition types in GPT or some peculiarities in the way the space is distributed in such partition tables.

But now our test ISO can install a fully functional system in the four combinations of MBR/GPT partition table and UEFI/Legacy boot, as it can be seen in the next image.

The storage reimplementation gets its own openQA instance

But there are better ways than screenshots to prove that something is working, even to prove it keeps working after future modifications. And in (open)SUSE we have one of the best tools for that – openQA.

We have always considered having the new stack tested in openQA as the first big milestone in its development (and we are finally there!) but we are aware that openQA.opensuse.org is already quite busy testing a huge combination of products, architectures and scenarios… even testing releases of openQA itself. Fortunately openQA is free software and can be installed anywhere so we created our own instance of openQA to test YaST stuff, specially the new storage layer.

So far, that instance is hosted in the internal SUSE network, which is enough for us to get continuous feedback about the changes we introduce. In addition to installing the new instance and configuring it to continuously grab and check the latest testing ISO, we had to introduce several changes in the ISO itself with the goal of keeping our tests as aligned as possible with the tests performed in the current Tumbleweed version by openQA.opensuse.org.

For example, we made sure the ISO was properly signed to avoid the need to always pass the insecure=1 boot argument. We also included several packages that were missing in order to make sure the ISO included all the software checked during the so-called MinimalX test and to make sure it shared the look and feel with a regular Tumbleweed, since many openQA checks are screenshot-based.

From now on, we can back every new feature with the corresponding integration tests, something crucial to ensure the quality of a piece of software meant to handle storage hardware.

Making Snapper work without DBus

As you may know, some YaST team members are also the main developers and maintainers of Snapper, the ultimate file-system snapshot tool for GNU/Linux systems.

Normally the snapper command line tool uses DBus to connect to snapperd which does most of the actual work. This allows non-root users to work with snapper.

There are however situations when using DBus is not possible and not being able to work in those situations was limiting Snapper’s usefulness. Now with the latest version all snapper commands support the –no-dbus option. This evolution is worth a blog post by itself… and, of course, we have it. To know all the details check this post at Snapper’s blog.

CASP (and beyond): improved roles

Do you remember the system roles feature introduced during development sprint 16 and improved in subsequent sprints? In case you don’t, let us remind you that system roles allow to define many settings of the installation just by choosing one of the offered roles. That’s only possible, of course, in products making use of that feature, like SLES.

For CASP we will have 3 different roles, as shown in the following screenshot.

The main difference between these three roles is the selection of patterns to be installed. But apart from that, the Worker role will offer an extra step during installation allowing the user to specify the address of the so-called Administration Dashboard.

That relatively small detail implied the development of a full new feature in the installer – the ability of a given role to define it’s own specific configuration, including the dialog to interact with the user. As expected from any other installation dialog, you can go back and forward without loosing the entered information. If the user goes back and selects a different role, then this additional dialog is not run again.

That new feature is, of course, not specific to CASP and could eventually be used in other products and roles. Just as a crazy example, openSUSE could decide to introduce a role called “NTP server”, running the YaST NTP server configuration right after the user selecting the role.

Other CASP related features

As already said, we have been focusing quite a lot on introducing features that are needed for CASP. It’s worth mentioning, in case it’s still unclear, that CASP will NOT ship its own adapted version of YaST. All the features introduced in the installer are in fact configurable and available for all other products as well. There is only one YaST codebase to rule them all.

Let’s briefly describe some of the introduced CASP-specific (at least for the time being) features.

CASP always uses Btrfs as filesystem for the root partition. At the end of the installation, the root btrfs subvolume will become read-only. All the other subvolumes will stay as read-write, as shown in this screenshot taken right after rebooting at the end of the installation process.

It makes no sense to update from any existing product to CASP. Thus, CASP media should not show an “update” option when booting, even if it’s still possible for advanced users to pass the UPDATE boot parameter. Since we needed to modify the installation-images package, we took the opportunity to make the “update” option and other settings configurable in a per product basis and we unified SLES and openSUSE packages, so now they share a single branch in the source code repository.

CASP is targeted to big deployments extended all over the world. To make possible the synchronization of geographically distributed nodes, the UTC timezone is enforced in every CASP installation. Thus, we implemented support for products to enforce a given timezone in the installer. Take into account this is different from a default timezone.

Last but not least, it has already been mentioned that the CASP installation workflow will have very few steps. That also affects the screen displaying the installations settings summary. In comparison to a regular SLES, some options must disappear because they are not configurable and some other sections must be added because they are not longer presented as a separate previous step. So far, this is the appearance of the installation settings screen in the current CASP prototype.

…and a surprise about the blog

We also prepared a Christmas gift related to the blog. The technical aspects are solved, but we are ironing out the administrative details. So you will have to wait until the next sprint report to see it in full glory. But, as the Spanish proverb says, “good things are worth waiting for”.

See you next year

That’s enough to report from our December sprint, we don’t want to bore you with every small bug fix. And talking about things that are worth waiting for, our next report will very likely be published at the beginning of February 2017.

That’s because we will put our Scrum process on hold during the Christmas session. We will restart it on the second week of the year, after the visit of the Three Wise Men. In several countries, it’s a tradition that the Three Kings bring gifts to the kids that have been nice, so let’s expect they bring us some new members for the team!

This my code take it! Contributing to Open Source project

Tuukka Pasanen — Fri, 28 Aug 2015 05:59:38 +0000

You want to be an Open Source developer? Want to hack up some nasty code. Make everyone obey your order and take over the world. I was young back when I entered these shallow waters and how green I was back then.. oh boy!

My first app

I been coding long time maybe too long. First I was using Pascal but it was too high level for me and not cool at all. When I started using Linux KDE 1 was Koolest desktop environment on earth and CDE was de-facto environment on for the big boys. Soon after KDE 2 was released I started using KDE PIM suite because KMail is still neat application and Korganizer was way better than Evolution. I realized I like to format my happenings in list which wasn’t supported the way I liked. I thought, ‘Hey what If I write console application for that. I know how to code C and Java so C++ can’t that hard?’.

It was possible. QT2 was really great GUI library for writing applications. That time QT licensing was insane but today it’s much easier to understand. Writing applications with KDE libraries wasn’t all that hard. Application was all main-function and soon as I got it working I mailed to KDE mailing list. I don’t have that mail any more and can’t find it from the net but it was something like: ‘Hello, I’m the best QT-coder ever and I have this app called KonsoleKalendar‘. I got very friendly feedback and it got included into CVS. I though now I’m greatest coder ever lived!

Actually I maintained KonsoleKalendar only short time and as I said I wasn’t happy about licensing of QT2 (It didn’t help that it was badly written application like ever). Most wonderful and bizarre thing is that KonsoleKalendar this exists in KDE5 and it’s in much better shape than when I left it. Afterwards this was the main learning point about collaboration in Open Source project for me. In start of 2000 there weren’t Git nor there where any fancy GUIs for sending patches. People mailed each other and tried to cope with CVS/Subversion and KDE still is very friendly community if you compare it to many others.

Getting along the communities

If you ever are going to cope the Open Source world try to get along with community. There is as many communities as there is project and they can be friendly, neutral, unknown or hostile. There are several nearly or really hostile projects where bug reports and patches are rejected with making fun of you body organs or mom. Hostile projects seems to have same pattern. There one master of universe mega alpha coder that dictators everything and then people who needs that project or are somehow contributed something that coder number one things that they can exists. If you cross this kind of project you should have very nice shielding or some precious code gems to take as bounty. Remember many very successful project also have that mega dictator which have some urge to make things happen. problem is that most of these dictators only understand code and can’t speak anything else.

Unknown projects are strangest ones. Common thing is that plenty of people use them and commit actively bug reports. Only few people commit changes to version control but they don’t pay any attention to bug reports or mailing-lists. libSDL is this kind of project. People share their patches on bug database but it’s nearly impossible to get them to code base or I just don’t understand how SDL development goes.

Neutral communities are nice. They have good management and clean orders how to contribute. You can get you patch or bug report in if it’s good enough. Neutral communities are somehow uneasy to enter but if you prove to make good contribution they let you make your thing. So what is difference with neutral and friendly. In friendly community you can ask stupid questions and someone answers you nicely not just blank silence or some kind RTFM answers.

How to contribute?

I tried to open situation little bit above but I give you example from Mixxx community which I’m most active these days. Most of people pop in mailing list and they have the best idea ever but they haven’t looked Mixxx code so they don’t know if it’s possible or not. If their ideas are reasonable and that human being is ready to do work it’s mostly greeted with some advises and notes how it should be made. After hiatus that developer commits Pull Request or not.

If Pull request been made then rough ride starts. Reviewing code is not a bad thing and people who are making these code reviews in Mixxx knows application well and only wants best code in. For green contributor it can be very frustrating. Basicly you have to have good code quality to get into Mixxx code base and you have to sign contributor paper.

How fast this happens? It really depends size of you contribution and how badly it’s done. People are doing code review in their spare time so it can be slow. If you just get out of blue with Pull request in Github you most probably won’t get nothing in. In Mixxx everything gets in with Pull Request (if it’s trivial then it’s just LGTM and merge style stuff).

If you didn’t read anything else this is what I wanted to say

What I have learned are in these three things: Know community you are going to work with (it takes time and motivation), Know how to contribute (what are rules) and try to cope with some level of frustration (They can be very hard on you if you ask stupid questions because most of them are stupid in Open Source world). If you just stop development because project things your code is pile of sh*t and you have to work on it more. Understand it’s pile of sh*t until they are happy and you have to make it to their standard. Every community it is always Dystopia of commiters. They decide what goes in and what doesn’t. If it’s your project you can choose but if you are not in enough you just have to cope with it. You can Fork code and start new project but believe most time it’s more progressive to stay in same project and try to change that. If it’s not possible then just Fork it but you can end up like FFmpeg and AVConv situation.

Working in Open Source project is about communication. So talk in mailing list, work on bug reports, write documentation and review code. If you are silent then you don’t exist and remember if you can’t code but you like to something there is always plenty to do. If you like to contribute learn: Git (Gitbub), Mercurial (Bitbucket), Subversion, Bug reporting (Mantis, Bugzilla), code structure of project and debugging/reading others code or if you are sysadmin, web designer or something else there always something to do. Remember if you think you are correct and everyone else if incorrect you are the one who have to prove them incorrect. Flaming and trolling is nice and fun but not going to get project forward.

I end here and remember these are my own notes.

Testing Android in openQA

lnussel — Tue, 06 Jan 2015 15:37:27 +0000

The other day Richard described in his blog how how he used openQA to test drive Fedora. Around the same time I read about Android x86 and saw that they offer iso images for download. So I wondered how hard it would be to get that one tested in openQA.

To find out I installed a current Tumbleweed snapshot in qemu. Installing openQA in the VM is straight forward with the provided packages, following the instrucutions at GitHub.

Keep in mind that nested virtualization needs to be turned on to be able to run the openQA worker inside qemu (pass nested=1 to kvm_intel resp kvm_amd). To conveniently access the web interface, vnc and ssh I added “-net” “user,hostfwd=tcp::8888-:80,hostfwd=tcp::5091-:5091,hostfwd=tcp::2222-:22” to the qemu command line.

As soon as openQA is up and running the remaining steps are easy:

add the sample Android test cases I created:

# cd /var/lib/openqa/tests
# git clone -b android-4.4 git://github.com/lnussel/os-autoinst-distri-android.git android-4.4
# chown geekotest android-4.4/needles

import the job templates so openQA learns what to do with Android iso images
```
# android-4.4/templates
```
Download android-x86-4.4-r2.iso and store it in /var/lib/openQA/factory/iso

# /usr/share/openqa/script/client isos post \
    ISO=android-x86-4.4-r2.iso \
    DISTRI=android VERSION=4.4 ARCH=i586 \
    FLAVOR=live BUILD=0002

Voilà! If everything went right openQA should now have created a job and the worker should start processing it.

Here are some screenshots and a video of my test run:

//lizards.opensuse.org/wp-content/uploads/2015/01/openqa-android.ogv

Looks like the emulator in the Android SDK is also qemu based. So theoretically it shouldn’t be hard to integrate that one into openQA in order to actually test on emulated phones as well.

OpenStack Infra/QA Meetup

bmwiedemann — Wed, 23 Jul 2014 13:54:38 +0000

Last week, around 30 people from around the world met in Darmstadt, Germany to discuss various things about OpenStack and its automatic testing mechanisms (CI).
The meeting was well-organized by Marc Koderer from Deutsche Telekom.
We were shown plans of what the Telekom intends to do with virtualization in general and OpenStack in particular and the most interesting one to me was to run clouds in dozens of datacenters across Germany, but have a single API for users to access.
There were some introductory sessions about the use of git review and gerrit, that mostly had things I (and I guess the majority of the others) already learned over the years. It included some new parts such as tracking “specs” – specifications (.rst files) in gerrit with proper review by the core reviewers, so that proper processes could already be applied in the design phase to ensure the project is moving in the right direction.

On the second day we learned that the infra team manages servers with puppet, about jenkins-job-builder (jjb) that creates around 4000 jobs from yaml templates. We learned about nodepool that keeps some VMs ready so that jobs in need will not have to wait for them to boot. 180-800 instances is quite an impressive number.
And then we spent three days on discussing and hacking things, the topics and outcomes of which you can find in the etherpad linked from the wiki page.
I got my first infra patch merged, and a SUSE Cloud CI account setup, so that in the future we can test devstack+tempest on openSUSE and have it comment in Gerrit. And maybe some day we can even have a test to deploy crowbar+openstack from git (including the patch from an open review) to provide useful feedback, but for that we might first want to move crowbar (which is consisting of dozens of repos – one for each module) to stackforge – which is the openstack-provided Gerrit hosting.

Code quality and Code guidelines

Tuukka Pasanen — Wed, 19 Feb 2014 08:07:11 +0000

Today I like to take you precious time to talk about thing that get so less attention in open source world. I like to talk little bit about QA and Coding guidelines.
Many reader who are in companies that take care of themselves or are involved some major open source or free software project like KDE, GNOME or Linux kernel knows that they/we have Coding guidelines. KDE have it here and Kernel have it here. So they have them and whats the big deal?

Why have code guideline and hang on it?

I’ve been coding since I was kid. I didn’t know better way to waste my time that sitting front of Commodore 64 and hack together some BASIC code and my code was horrible.
I didn’t have any clue how to organize it or how to make it more readable (BASIC ain’t that BASIC when you try to read it afterwards). Years after some major Assembly experiences on 286 and 386 I found Pascal. Then I understood one thing. If you don’t understand your stuff you write you don’t understand it after a while! So I tried to be more constructed what I did (Pascal/Delphi still have soft spot in my heart.. always!) and forget it sort after that for almost 10 years.

Doing it professionally

If you code for living you have to be very sure you can read your code and others can also. You can stop developing your code after that someone else can come to maintaining that particular code and this is situation in open source most of the time.
You never know how long you code will be used, forked or maintained by someone else brave after you have moved on to something more interesting.
Following code guidelines is not very difficult. You just forget about your excuses about beauty of your code and ignorance about making code look solid with rest of application. Most horrible
is read some code that where you have to learn how different people write code and their definition about good code guidelines.
If you lazy like me you can use tools like:

I do want my code to look like I did it

No you don’t if you contribute to someone else project you make you code look like his or her code. Structured code is always looking good.
We have policy that version control system make decision how code should look like. So there is some tool and version control hook script that structures code to what is our stated our code guidelines and believe me it have worked better than I expect. People still have to make changes but you have reply what is wrong. No more rants about what is correct looking code and people giving me excuses why code is not following code guidelines. There is only one that have dictatorship here and it’s version control system and 99.9% it does good job.

Code guidelines are not equal Code quality

No they are not. They have nothing to do with Code quality control or do they? openSUSE Build service have very good for everyone to take good example. To make your RPM package SPEC-file compile to binary or source RPM you have to follow machine checked rules. If you correct at least those it will be all okay. Scripts check that you don’t have million bytes long jargon in Summary-line or you have Summary-line at all. They run rpmlint and check C-code for some we’ll known problems. If they are qualified enough to go through this you won’t get your RPM/DEB/Arch-package.
Now we can argue long time do static code checking have any value but long time user of static code checking tools like cppcheck I can tell static analyzing find many that people just ignore to find because they don’t have time to thing whole logic or understand that memory handling issue.
Best way to use them I again run them on version control system hook before code gets in. It won’t make code perfect but you make developer fear for commiting and when developer fear they make better code.
After you have structured code and machine checks it statically it’s much easier to get in to point and find problems. So you can talk about what is really wrong or right with code. It’s easier to read some text if that have passed thesaurus in Libreoffice and again openSUSE obs policy is very very efficient. It spots stuff that you don’t care about and yes I miss SPEC code structure-script in OBS but you can’t have all.
Now it’s time make code guidelines to you project and use some tool to make all project look like it (My opinion is that have always some for example Artistic Style tool example how to structure code for your with code guidelines) and that FFmpeg stuff is coming when I feel it’s ready to rock!

another way to access a cloud VM’s VNC console

bmwiedemann — Sat, 08 Feb 2014 08:14:58 +0000

If you have used a cloud, that was based on OpenStack, you will have seen the dashboard including a web-based VNC access using noVNC + WebSockets.
However, it was not possible to access this VNC directly (e.g. with my favourite gvncviewer from the gtk-vnc-tools package), because the actual compute nodes are hidden and accessing them would circumvent authentication, too.

I want this for the option to add an OpenStack-backend to openQA, my OS-autotesting framework, which emulates a user by using a few primitives: grabbing screenshots and typing keys (can be done through VNC), powering up a machine(=nova boot), inserting/ejecting an installation medium (=nova volume-attach / volume-detach).

To allow for this, I wrote a small perl script, that translates a TCP-connection into a WebSocket-connection.
It is installed like this
git clone https://github.com/bmwiedemann/connectionproxy.git sudo /sbin/OneClickInstallUI http://multiymp.zq1.de/perl-Protocol-WebSocket?base=http://download.opensuse.org/repositories/devel:languages:perl

and is used like this
nova get-vnc-console $YOURINSTANCE novnc perl wsconnectionproxy.pl --port 5942 --to http://cloud.example.com:6080/vnc_auto.html?token=73a3e035-cc28-49b4-9013-a9692671788e gvncviewer localhost:42

I hope this neat code will be useful for other people and tasks as well and wish you a lot of fun with it.

Some technical details:

The code is able to handle multiple connections in a single thread using select.
HTTPS is not supported in the code, but likely could be done with stunnel.
WebSocket-code was written in 3h.
noVNC tokens expire after a few minutes.

spec-cleaner: hide all your precious cruft!

calumma — Fri, 31 Jan 2014 08:33:31 +0000

As we stated in our communication over the time, our team’s main focus for foreseeable future is Factory and how to manage all those contributions. Goal is not to increase the number of SRs that is coming to Factory, but to make sure we can process more and to make sure we see even well hidden consequences to make sure that Factory is “stable” and “usable”.

Not really part of our current sprints, but something that will hopefully help us is spec-cleaner that Tomáš Chvátal and Tomáš Čech were working on lately during their free time/hackweek. What is it trying to address? Currently, there are some packaging guidelines, but when you write a spec file for your software, you still have plenty of choices. How do you order all the information in the header? Do you use curly brackets around macros? Do you use macros? Which ones do you use and which not? Do you use binaries in dependencies? Package config? Perl symbols? Package names? There is format_spec_file obs service that tries to unify a little bit the coding style but leaves quite some of the stuff up to you. Not necessarily a bad thing, but if you have to compare changes and review packages that are using completely different coding styles the process becomes harder and slower.

spec-cleaner is format_spec_file taken to another level. It tries to unify coding style as much as it can. It uses consistent conventions, makes most of the decisions mentioned previously for you and if you already decided for one way in the past, it will try to convert your spec file to follow the conventions that it specifies. It’s not enforcing anything, it’s standalone script and therefore you don’t have to be worried that you spec file will be out of your control. You can run it, verify the result (actually, you should verify the results as there might still be some bugs) and commit it to OBS. If we all do it, our packages will all look more alike and it will be easier to read and review them.

How to try it? How to help? Well, code is on GitHub and packages are in OBS. You may have a version of it in your distribution, but that one is heavily outdated (even the 13.1 version), so add openSUSE:Tools repo and try the version from there.

zypper ar -f obs://openSUSE:Tools/openSUSE_13.1 openSUSE-Tools
zypper in spec-cleaner

You can then go to some local checkout and try what changes does it propose for your spec file. Easiest way is to just let it do stuff by calling it and taking a look at changes afterwards.

spec-cleaner -p -i *.spec
osc diff

If it works, great, we will have more unified spec files. If it doesn’t, file a bug

Keeping Factory in shape

calumma — Thu, 13 Jun 2013 12:00:41 +0000

Michal Hrušecký has been helping out on maintaining Factory in shape and shares his experiences.

Factory is development version of openSUSE and it is where the next openSUSE is taking form. Hundreds of packagers send packages into Factory to be integrated as a part of the new release and many more use Factory for testing or for their daily work. Thus it is really important to keep Factory rolling and usable. Everybody knows that Coolo is the Factory master and he does everything to make next openSUSE be the best ever. But keeping factory in shape is really complicated and stressing task. There are dozens of request everyday and each one of them can potentially break something. So Factory can always use a pair of extra hands and for some time I have been one of them. I’d like to give you some insight in what we do, working on keeping Factory building and working.

Keep it building

With a constant influx of newer and cooler versions of libraries and tools it is easy to break existing applications with shiny new software. So we always have some build failures in Factory. Part of our job is to resolve them because if it doesn’t build, you can’t test nor ship it. As developer, you may have seen submit request in various projects in OBS fixing builds for Factory. Everyday I take a look at a number of build failures, investigate why they are not building in Factory and try to do something about it.

For example new version of the GNU C Compiler (GCC) is quite often more strict on includes, requiring developers to be more verbose about the exact internal and external libraries there applications require to be build. What used to build now doesn’t, because you are missing include files. The older GCC let you slip by, but now you have to fix it, build failure by build failure. Another example is GTK which keeps deprecating old API functions and you have to keep up and replace them with correct counterparts from the new API. Sometimes even the kernel changes API and third party modules stops building. All these errors will get eventually resolved upstream (if upstream is alive), but as we follow upstream quite closely before feature freeze, it may happen that we are first facing these issues because sometime we are the first ones who tried to compile this software using new GTK or the new GCC. Of course, we attempt to get these fixes back upstream and share them with other distributions where appropriate!

Image depicting the Factory Workflow

Test it

Another important part of working on a better Factory is testing. If everything builds, we’re happy. You might have heard the saying:

if it builds, ship it!

(un)fortunately, this isn’t Coolo’s idea of how the world works. In Factory, things not only have to build, but also work. Oh, AND conform to some stringent requirements. Here comes into play what most of our team was working on in the past months and what was described last week – openQA. openQA tests the latest builds of openSUSE Factory, tries to install them and run tests on some applications as well. But still, not all failures from automatic testing are real failures. Sometimes application just changed too much to be covered by test. So from time to time we go through failing openQA tests, try to reproduce them, figure out what went wrong and either fix it or report it via bugzilla to the corresponding maintainer.

Concluding

Each of these tasks is relatively small (check why this test failed, fix this package to build with new gcc) but what makes it hard is the number of packages we’ve got and the constant inflow of changes. Also, because the issue can be all over the place, each and every one requires you to dive into an entirely new package, with new and interesting quirks, build systems, languages and more. It is a great way of getting to know a wide variety of packages and applications, finding a gem every now and then. But also implies a lot of work.

Things will get slower and more stable after feature freeze when we will spend more time in the testing part, while today we mostly work on the build part. Still, Factory has to keep building, without that it becomes impossible to keep developing openSUSE. What we do, fixing these build errors, it might not be super visible, but it matters a lot.

Statistics

On a related note, we’d like to (re) introduce weekly Factory statistics! It has been a while since we had those, once upon a time courtesy of AJ and Guido. Now, Alberto provides them. Here you go, the top-fifteen contributors to openSUSE Factory last week (from Sunday June 2nd to to Sunday June 9th):

Dominique Leuenberger
Tobias Klausmann
Stefan Dirsch
Peter Varkoly
Stephan Kulow
Marcus Meissner
Michal Hrusecky
Cristian Rodríguez
Dirk Mueller
Sascha Peilicke
Kyrill Detinov
Dr. Werner Fink
Bjørn Lie
Tomáš Chvátal
Niels Abspoel

openQA in openSUSE

calumma — Thu, 06 Jun 2013 12:00:26 +0000

Today, we’ve got for you an introduction of the teams’ work on openQA by Alberto Planas Domínguez.

The last 12.3 release was important for the openSUSE team for a number of reasons. One reason is that we wanted to integrate QA (Quality Assurance) into the release process in an early stage. You might remember that this release had UEFI and Secure Boot support coming and everybody had read the scary reports about badly broken machines that can only be fixed replacing the firmware. Obviously openSUSE can’t allow such things to happen to our user base, so we wanted to do more testing.

Testing is hard

Testing a distribution seems easy at first sight:

Take the ISO of the last build and put it on the USB stick
Boot from the USB
Install and test
…
Profit!

testing in progress!

But look a bit further and you will see that, actually, only the installation process itself is already a combinatorial problem. In openSUSE we have different mediums (DVD, KDE and Gnome Live images, NET installation image and the new Rescue image), three official architectures (32 and 64 bits and ARMv7), a bunch of different file systems (Ext3 / Ext4, Btrfs, LVM with or without encryption, etc.), different boot loaders (GRUB 2, LILO, SHIM). Yeah… Even without doing the math you see that for only this small subset of variables we have hundreds of possible combinations. And this is just the installation process, we are not talking about the various desktops and applications or hardware like network interfaces or graphics cards here.

And we want continuous testing

And that is only the final testing round! If we want to be serious about QA and testing, we need to run the full test battery for every build that OBS generate for us, with extra attention to the Milestones, Betas and RC which are scheduled in the release road-map.

We can of course attempt to optimize our testing approach. For example, if I am the maintainer of a package and I sure that my last version is working perfectly in Factory (because I tested it in my system, of course), do I really need to test this application again and again when a new ISO build is released? Unfortunately, we can not take a shortcut here. As Distribution, our job is integration and so we need to test the entire product again for every build. A single change in an external library or in any other package which I depend on can break my package. The interdependencies for a integration project of the size of openSUSE are so intricate that is faster to run the full test again. With this approach we are avoiding regressions in our distribution, important during development. But also a lot of work – who has time for all this testing?

OpenQA as a solution

For us, there’s no doubt about it: openQA is the correct tool for this. openQA is already used to test certain parts of openSUSE, and has shown itself as a competent tool to test other distributions like Fedora or Debian.

To experiment with openQA, the openSUSE team decided to launch a local implementation of the tool and start feeding it with 12.3 builds. But we soon ran into some limitation in the way we can express desired test outcomes in openQA and we got ideas on how to improve the detection of failed and succeeded tests. We also discovered that some tests in openQA had the bad habit of starting to work in “monkey mode” by simply sending commands and events to the virtual machine without checking if those interaction have expected behavior or not, losing track of the test progress.

openQA has the great benefit of being open source so we can improve its usefulness for testing Factory. Moreover, the original author of openQA, Bernhard M. Wiedemann, is a very talented developer and works for SUSE so upstream is very close to us. So we decided to start hacking!

openQA work

After the 12.3 release we decided to spend some quality time improving openQA as a team project. This was managed using the public Chili (a Redmine fork) project management web application. We published all the milestones, tasks, goals and documentation in the “openQA improvement project”. The management side of this project perhaps needs a different post, but for now we can say that we tried to develop it as open as possible. Of course you can get the full code from the openSUSE github account.

Major changes

The main architectural changes implemented during our 10 weeks of coding on openQA can be summarized as follows:

Integration with openCV
Replacement of PPM graphic file format with PNG
Introduced needles; test with better control of the state
A proper job dispatcher for new test configurations
Better internal scheduler, with snapshots and a way to skip tests
Improvement in the communication between webUI and the virtual machine

The Needle editor in action

openCV brings robust testing

The tests in openQA need to check what is happening in the virtual (or real) machine to verify results, and the main source of information is the output of the screen. This information is usually a graphical information: we can instruct QEMU (or Vbox) to retry screenshots in a periodic basis. To properly evaluate a test outcome we need to find some kind of information in those pictures, and for that we use the computer vision library openCV.

With this library we can implement different methods to find relevant sections of the image, like buttons, error messages or text. These are then used for the test to get information about the actual environment of the installation process and to find out if the test passed or not. Previously, checksums on the images were used to determine outcomes. This led to many false positives (tests failing too often) due to simple theming and layout changes – a single pixel changing broke the test. openCV support was introduced earlier by Dominik Heidler to enable testing with noisy analogue VGA capture and we extended the usage of openCV matching to be more versatile, powerful and easier to use (both for test-module-writers and for maintainers).

Introducing needles

openQA has been modified to use PNG instead of PPM files to store images to test against, improving performance but also enabling openQA to store certain meta-data within the images. This brings us to the most important improvement in openQA: the introduction of the needle. A needle is an PNG image with some meta-data associated (a JSON document). This meta-data describes one or multiple ‘regions of interest’ (RoI) in the original image which can be used by the test to match the current screenshot. For example, if the installation is in the partition manager and we send the expected keystrokes to set Btrfs as our default file system, we can assert that this option is currently set using a needle where the RoI has the correct check box marked. In other words: we create a needle with an area covering the check box. The system will search this area in the current screen to assert that there is, somewhere, a check box with this label correctly marked. And will use openCV to make sure that slight changes in theming or layout will not result in a failed test.

The needle concept is really powerful. When a test uses needles with multiple RoI’s, the system will try to match every area in the current screenshot, in whatever position. There are areas that can be excluded, and areas that can be processed using an OCR (Tesseract) to extract and match text from them.

Thanks to needles we can now create tests that are always in a known state and they can inform complex decisions about the next step to take. For example, we can have tests that can detect and respond correctly when a sudo prompt appears suddenly, or where an error dialog appears when is not expected. More important, we can detect errors more quickly, aborting the installation process and pointing the developer to the exact error.

Faster testing with snapshots

Test result overview

We also implemented a way to create snapshots of the virtual machine status. This is useful if we want to retry some tests, or start the test-set from a specific test. For example, if we are testing the Firefox web browser, we want to avoid all the installation tests, and maybe some of the tests related with other applications. With this snapshot feature, we can load the virtual machine in the state where Firefox can be tested immediately.

Improved web UI

The final major area of focus has been on the web interface. We designed a set of dialogs to create and edit needles. Using this editor we can also see why the current tests are failing, comparing the current screenshot with the different expected needles from the tests.

Also, from the web interface we can control the execution of the virtual machine: signaling to stop or continue the execution. This is a feature that is useful when we want to create needles in an interactive way.

Upstream

We’re very happy that Bernhard has helped us, both with work and advice, to get these changes implemented. Several improvements were integrated in the current production version of openQA and most of the more invasive ones are part of the V2 branch of openQA. We plan to sit together with Bernhard to see about deploying V2 to openqa.opensuse.org for testing factory as soon as possible.

There is still work to be done. For example, for full integration testing we need to expand on the current ability which allows to run the tests on real hardware. This will for example allow testing graphics and network cards. Also, writing proper documentation is on the todo list. For those interested in helping out putting openQA to work and keeping the quality of our distribution high, the openSUSE Conference will feature a workshop on creating tests for Factory.