Home Home
Sign up | Login

bmwiedemann

coder, tester, poet

Author Archive

How we run our OpenStack cloud

January 23rd, 2017 by

This post it to document how we setup cloud.suse.de which is one of our many internal SUSE OpenStack Cloud deployments for use by R&D.

In 2016-06 we started the deployment with SOC6 on 4 nodes. 1 controller and 3 compute nodes that also served for ceph (distributed storage) with their 2nd HDD. Since the nodes are from 2012 they only have 1gbit network and spinning disks. Thus ceph only delivers ~50 MB/s which is sufficient for many use cases.

We did not deploy that cloud with HA, even though our product supports it. The two main reasons for that are

  • that it will use up two or three nodes instead of one for controller services, which is significant if you start out with only 4 (and grow to 6)
  • that it increases the complexity of setup, operations and debugging and thus might lead to decreased availability of the cloud service

Then we have a limited supply of vlans even though technically they are just numbers between 1 and 4095, in SUSE we do allocations to be able to switch together networks from further away. So we could not use vlan mode in neutron if we want to allow software defined network (=SDN) (we did not in old.cloud.suse.de and I did not hear complaints, but now I see a lot of people using SDN)
So we went with ovs+vxlan +dvr (open vSwitch + Virtual eXtensible LAN + Distributed Virtual Router) because that allows VMs to remain reachable even when the controller node reboots.
But then I found that they cannot use DNS during that time, because distributed virtual DNS was not yet implemented. And ovs has some annoying bugs are hard to debug and fix. So I built ugly workarounds that mostly hide^Wsolve the problems from our users’ point of view.
For the next cloud deployment, I will try to use linuxbridge+vlan or linuxbridge+vxlan mode.
And the uptime is pretty good. But it could be better with proper monitoring.

Because we needed to redeploy multiple times before we got all the details right and to document the setup, we scripted most of the deployment with qa_crowbarsetup (which is part of our CI) and extra files in https://github.com/SUSE-Cloud/automation/tree/production/scripts/productioncloud. The only part not in there are the passwords.

We use proper SSL certs from our internal SUSE CA.
For that we needed to install that root CA on all involved nodes.

We use kvm, because it is the most advanced and stable of the supported hypervisors. Xen might be a possible 2nd choice. We use two custom kvm patches to fix nested virt on our G3 Opteron CPUs.

Overall we use 3 vlans. One each for admin, public/floating, sdn/storage networks.
We increased the default /24 IP ranges because we needed more IPs in the fixed and public/floating networks

For authentication, we use our internal R&D LDAP server, but since it does not have information about user’s groups, I wrote a perl script to pull that information from the Novell/innerweb LDAP server and export it as json for use by the hybrid_json assignment backend I wrote.

In addition I wrote a cloud-stats.sh to email weekly reports about utilization of the cloud and another script to tell users about which instances they still have, but might have forgotten.

On the cloud user side, we and other people use one or more of

  • Salt-cloud
  • Nova boot
  • salt-ssh
  • terraform
  • heat

to script instance setup and administration.

Overall we are now hosting 70 instance VMs on 5 compute nodes that together have cost us below 20000€

How to build OS images without kiwi

December 26th, 2016 by

kiwi has long been the one standard way of building images in openSUSE, but even though there exist extensive writings on how to use it, for many it is still an arcane thing better left to the Great Magicians.

Thus, I started to use a simpler alternative image building method, named altimagebuild when I built our first working Raspberry Pi images in 2013 and now I re-used that to build x86_64 VM images at
https://build.opensuse.org/package/show/home:bmwiedemann/altimagebuild
after I found out that it even works in OBS, including publishing the result to our mirror infrastructure.
It is still in rpm format because of how it is produced, so you have to use unrpm to get to the image file.

This method uses 3 parts.

  • a .spec file that lists packages to be pulled into the image
  • a mkrootfs.sh that converts the build system into the future root filesystem you want
  • a mkimage.sh that converts the rootfs into a filesystem image

The good thing about it is that you do not need specialized external tools, because everything is hard-coded in the scripts.
And the bad thing about it is that everything is hard-coded in the scripts, so it is hard to share general improvements over a wider range of images.

In the current version, it builds cloud-enabled partitionless images (which is nice for VMs because you can just use resize2fs to get a larger filesystem and if you later want to access your VM’s data from outside, you simply use mount -o loop)
But it can build anything you want.

To make your own build, do osc checkout home:bmwiedemann/altimagebuild && cd $_ && osc build openSUSE_Leap_42.2

So what images would you want to build?

targitter project – about OBS, tars and git

October 5th, 2015 by

In OBS we use source tarballs everywhere to build rpms (and debs) from.

This has at least two major downsides:

  1. Storing all old tar files takes up a lot of disk space
  2. OBS workflows with .tar files and patches are rather different and somewhat disconnected from the git workflows we usually use everywhere else these days. E.g. for the SUSE OpenStack Cloud team we have a “trackupstream” jenkins job, that pulls the latest git version into a tarball once every day.

Fedora already keeps their metadata in git, but only a hash of the tarball.

So as one first step, I used two rather different projects to see how different the space usage would be. On the slow side I used 20 gtk2 tarballs from the last 5 years and on the fast side, I used 31 openstack-nova tarballs from Cloud:OpenStack:Master project from the last 5 months.
I used scripts that uncompressed each tarball, added it to a git repo and used git gc to trigger git’s compression.

Here are the resulting cumulative size graphs:

gtk2 size graph
nova size graph

The raw numbers after 20 tarballs: for nova the ratio is 89772:7344 = 12.2 and for gtk2 the ratio is 296836:53076 = 5.6

What do you think: would it be worth the effort to use more git in our OBS workflows?

Do we care about being able to reproduce the original tarballs? While this is possible, it has some challenges in differing file-ordering, timestamps, file-ownerships and compression levels.
Or would it be enough if OBS converted a tarball into a signed commit (so it cannot be forged without people being able to notice)?

Do you know of a tool that can uncompress tarballs in a way that allows to track the content as single files, and allows to later re-create the original verbatim tarball, such that upstream signatures would still match?

zypper install tab-completion

June 6th, 2015 by

This is a follow-up on my earlier post on zypper tab-completion.

Completion for package-names just made its way into git (thus soon will appear in Factory aka Tumbleweed) after ~6 weeks of back and forth exploring different approaches.

And it is super-fast 🙂

If you do not want to wait, you can use OneClickInstallCLI http://multiymp.zq1.de/zypp:Head/zypper
with allowing some vendor changes for libzypp and libsolv

zypper tab-completion and some thoughts

April 26th, 2015 by

Today I spent some hours implementing nice tab-completion for zypper. There was already a lot done 6 years ago, but the part about installing/removing packages was missing.

Now the thinking part is about the speed. For the tab-completion I needed a list of installed packages and of course we have that in our RPM database (using berkeley DB as a backend). However querying the list with rpm -qa already took over a second on a modern and fast system. On my poor netbook with a cold cache, it took 25 seconds (5 secs on second try with hot cache)… And the point is that you probably do not want to wait 5 seconds for your tab-completion to react.

So to avoid this problem, I used caching via make to produce a better format (plain text). This is then post-processed with sed in a fraction of a second – a speedup factor somewhere between 15 and 150. This makes a big difference.

In the end, I still wonder why plain text is so much faster than a DB. I guess, one reason is that the DB is optimized for retrieval of single values – e.g. rpm -q bash – this is very fast (but even there an egrep “^bash-[^-]+-[^-]+$” is more than twice as fast).

I still want to optimize zypper for better speed, so that a search might some day return in under 2 seconds. One idea for that is to not parse all those config+repo files every time, but only when they change. It could use mmaped files under /var/cache/zypp* as memory to store the binary representations. Though it might become complicated, if dynamic structures such as linked lists are involved.

The future will be interesting…

OpenStack Infra/QA Meetup

July 23rd, 2014 by

Last week, around 30 people from around the world met in Darmstadt, Germany to discuss various things about OpenStack and its automatic testing mechanisms (CI).
The meeting was well-organized by Marc Koderer from Deutsche Telekom.
We were shown plans of what the Telekom intends to do with virtualization in general and OpenStack in particular and the most interesting one to me was to run clouds in dozens of datacenters across Germany, but have a single API for users to access.
There were some introductory sessions about the use of git review and gerrit, that mostly had things I (and I guess the majority of the others) already learned over the years. It included some new parts such as tracking “specs” – specifications (.rst files) in gerrit with proper review by the core reviewers, so that proper processes could already be applied in the design phase to ensure the project is moving in the right direction.

On the second day we learned that the infra team manages servers with puppet, about jenkins-job-builder (jjb) that creates around 4000 jobs from yaml templates. We learned about nodepool that keeps some VMs ready so that jobs in need will not have to wait for them to boot. 180-800 instances is quite an impressive number.
And then we spent three days on discussing and hacking things, the topics and outcomes of which you can find in the etherpad linked from the wiki page.
I got my first infra patch merged, and a SUSE Cloud CI account setup, so that in the future we can test devstack+tempest on openSUSE and have it comment in Gerrit. And maybe some day we can even have a test to deploy crowbar+openstack from git (including the patch from an open review) to provide useful feedback, but for that we might first want to move crowbar (which is consisting of dozens of repos – one for each module) to stackforge – which is the openstack-provided Gerrit hosting.

see also: pleia2’s post

Overall for me it was a nice experience to work together with all these smart people and we certainly had a lot of fun

getting my DVB-T card to work

March 6th, 2014 by

Today I tried to get a DVB-T card to work with a new antenna on a new 13.1 install.
I know it was working, because I ran it with 12.3 on this machine last year.

hwinfo –tv
showed
Model: “Hauppauge computer works WinTV HVR-1110”
Vendor: pci 0x1131 “Philips Semiconductors”
Device: pci 0x7133 “SAA7131/SAA7133/SAA7135 Video Broadcast Decoder”

So after plugging everything in, I started kaffeine, which still knew about all local channels, but could not tune.
http://www.linuxtv.org/wiki/index.php/Hauppauge_WinTV-HVR-1110 gave the important hint that one needs a firmware file. After that was in /lib/firmware/dvb-fe-tda10046.fw and after a reboot, came the next try. kaffeine now would show 99% SNR, so a good signal and even know about what is currently on air, however picture remained black.
kaffeine hinted that it needs extra software, but could not find it, even though packman repos were available (annoying bug).

http://opensuse-community.org/ finally helped – I needed the libxine2-codecs package from the packman repo.
Now everything is working after less than an hour.

another way to access a cloud VM’s VNC console

February 8th, 2014 by

If you have used a cloud, that was based on OpenStack, you will have seen the dashboard including a web-based VNC access using noVNC + WebSockets.
However, it was not possible to access this VNC directly (e.g. with my favourite gvncviewer from the gtk-vnc-tools package), because the actual compute nodes are hidden and accessing them would circumvent authentication, too.

I want this for the option to add an OpenStack-backend to openQA, my OS-autotesting framework, which emulates a user by using a few primitives: grabbing screenshots and typing keys (can be done through VNC), powering up a machine(=nova boot), inserting/ejecting an installation medium (=nova volume-attach / volume-detach).

To allow for this, I wrote a small perl script, that translates a TCP-connection into a WebSocket-connection.
It is installed like this
git clone https://github.com/bmwiedemann/connectionproxy.git
sudo /sbin/OneClickInstallUI http://multiymp.zq1.de/perl-Protocol-WebSocket?base=http://download.opensuse.org/repositories/devel:languages:perl

and is used like this
nova get-vnc-console $YOURINSTANCE novnc
perl wsconnectionproxy.pl --port 5942 --to http://cloud.example.com:6080/vnc_auto.html?token=73a3e035-cc28-49b4-9013-a9692671788e
gvncviewer localhost:42

I hope this neat code will be useful for other people and tasks as well and wish you a lot of fun with it.

Some technical details:

  • The code is able to handle multiple connections in a single thread using select.
  • HTTPS is not supported in the code, but likely could be done with stunnel.
  • WebSocket-code was written in 3h.
  • noVNC tokens expire after a few minutes.

Hongkong OpenStack Design Summit

November 13th, 2013 by

So last week many OpenStack (cloud software) developers met in Hongkong’s world expo halls to discuss the future development and show off what is done already.

Overall, I heard there were 3000 attendees, with 800 being developers or so. That sounds like a large number of people, but luckily everything felt well-organized and the rooms were always big enough to have seats for all interested.

The design sessions were usually pretty low-level and focused into one component, so it was not easy for me to make useful contributions in there. The session about read-only API access (e.g. for helpdesk workers and monitoring) and about HA were most useful to me.

In the breakout rooms were interesting sessions by many large OpenStack users (CERN, Ebay, Paypal, Dreamhost, Rackspace) giving valuable insights into what people expect from and do with a cloud. Many of them are using custom-built parts, because the plain OpenStack is still not complete to run a cloud. SUSE Cloud ships with some such missing parts (e.g. deployment and configuration management), but most organisations seem to run their own at the moment.

Cloudbase was there telling about their Hyper-V support that we integrated in SUSE Cloud.
Apart from the 6 SUSE Cloud developers there were several local (and one Australian) SUSE guys manning the booth.

Overall it was quite some experience to be there (in such an exotic and yet nice place) and listen and talk to so many different people from very different backgrounds.

How I ran openSUSE on a Nexus 7

September 22nd, 2013 by

The Nexus 7 (2012 version) is a 7 inch tablet by Google+Asus.
The nice thing about it is, that it has an unlockable bootloader. Also it has an armv7 CPU and we built openSUSE for this CPU for some years. I had one such device with a broken display, so doing some more risky things with it seemed to be appropriate.
I wanted to run my own software on it. Running openSUSE in a chroot (change-root) environment is usually a lot easier than replacing the whole system, so this is where I went.

First, I needed two tools. One is the “adb” – Android DeBug tool from the official sdk and the other is “fastboot” which was hard to find, so I mirror it here.
I got me the stable ROM from http://wiki.cyanogenmod.org/w/Grouper_Info and followed their installation instructions. adb shell only seemed to work while in bootloader (which you reach by holding Volume-Down+Power during boot)
The hardest part was to re-enable USB-debugging by going into Settings/About tablet and tapping Build-Number seven times.

Also before zapping everything that was there, I did in adb shell : cp -a /system/app /sdcard/
and back later.
So after following all the other installation steps, I had cyanogenmod booting. I attached a bluetooth keyboard so that I can better type. The ROM comes with a terminal app, which I opened. type su - to become root after a security popup.
Now, I downloaded my lastest Raspberry-Pi image from http://www.zq1.de/bernhard/linux/opensuse/. This is under /sdcard/Download where I unpacked it with xz -d
Then comes the tricky part. The image has a partition table, but here we just need the root filesystem. With fdisk -lu we can see that it starts at sector 309248. One could copy out that part with dd or use a loop device with offset like this:
#!/system/xbin/sh
mknod /dev/loop0 b 7 0
losetup -d /dev/loop0 # cleanup of previous try
losetup -o `expr 512 \* 309248` /dev/loop0 rasp*img
mkdir -p mnt
mount -t ext2 /dev/loop0 mnt

Now we have access to the openSUSE files under mnt.
In there I created me a chroot.sh:
#!/system/xbin/sh
for m in proc sys dev ; do mount -o bind /$m $m ; done
HOME=/root PATH=/sbin:/usr/sbin:/usr/local/sbin:/root/bin:/usr/local/bin:/usr/bin:/bin:/usr/bin/X11:/usr/X11R6/bin /system/xbin/chroot . bin/bash
for m in proc sys dev ; do umount $m ; done

With that, the only remaining thing to do was to add a nameserver line to /etc/resolv.conf before I could use zypper to install software e.g. zypper install yast2-network yast2-ncurses.
Running yast lan on the Nexus 7 gives nice sight.

I guess one could also use the armv7 rootfs to have software built for armv7 instead of the compatible armv6. But for me it does not matter much.