Home Home
Sign up | Login

bmwiedemann

coder, tester, poet

Author Archive

Tricks with IPFS

August 7th, 2019 by

Since April I am using IPFS

Now I wanted to document some neat tricks and details.

When you have the hex-encoded sha256sum of a small file – for this example let’s use the GPLv3.txt on our media –
sha256sum /ipns/opensuse.zq1.de/tumbleweed/repo/oss/GPLv3.txt
8ceb4b9ee5adedde47b31e975c1d90c73ad27b6b165a1dcd80c7c545eb65b90

Then you can use the hash to address content directly by prefixing it with /ipfs/f01551220 so it becomes

/ipfs/f015512208ceb4b9ee5adedde47b31e975c1d90c73ad27b6b165a1dcd80c7c545eb65b903

In theory this also works with SHA1 and the /ipfs/f01551114 prefix, but then you risk experiencing non-unique content like
/ipfs/f0155111438762cf7f55934b34d179ae6a4c80cadccbb7f0a

And dont even think about using MD5.

For this trick to work, the file needs to be added with ipfs add --raw-leaves and it must be a single chunk – by default 256kB or smaller, but if you do the adding, you can also use larger chunks.

Here is a decoding of the different parts of the prefix:
/ipfs/ is the common path for IPFS-addressed content
f is the multibase prefix for hex-encoded data
01 is for the CID version 1
55 is for raw binary
12 is for sha2-256 (the default hash in IPFS)
20 is for 32 byte = 256 bit length of hash

And finally, you can also access this content via the various IPFS-web-gateways:
https://ipfs.io/ipfs/f015512208ceb4b9ee5adedde47b31e975c1d90c73ad27b6b165a1dcd80c7c545eb65b903

You can also do the same trick with other multibase encodings of the same data – e.g. base2

Base2 looks pretty geeky, but so far I have not found practical applications.

Debugging jenkins

July 31st, 2019 by

We had strange near-daily outages of our internal busy jenkins for some weeks.

To get to the root cause of the issue, we enabled remote debugging with

-Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=9010 -Dcom.sun.management.jmxremote.ssl=false -Djava.rmi.server.hostname=ci.suse.de -Dcom.sun.management.jmxremote.password.file=/var/lib/jenkins/jmxremote.password

and attached visualvm to see what it was doing.
This showed the number of threads and memory usage in a sawtooth pattern. Every time the garbage collector ran, it dropped 500-1000 threads.

Today we noticed that every time it threw these java.lang.OutOfMemoryError: unable to create new native thread errors, the maximum number of threads was 2018… suspiciously close to 2048. Looking for the same time in journalctl showed
kernel: cgroup: fork rejected by pids controller in /system.slice/jenkins.service

So it was systemd refusing java’s request for a new thread and jenkins not handling that gracefully in all cases.
That was easily avoided with a
TasksMax=8192

Now the new peak was at 4890 live threads and jenkins served all Geekos happily ever after.

experimental openSUSE mirror via IPFS

April 3rd, 2019 by

The InterPlanetary File System (IPFS) can be used to provide files in a more efficient and distributed way than HTTP.

Our filesystem repo already has the go-ipfs client.

You use it with
ipfs daemon --init

And then you can add my Tumbleweed mirror with
zypper ar http://127.0.0.1:8080/ipns/opensuse.zq1.de./tumbleweed/repo/oss/ ipfs-oss

You can also browse the content online at
http://opensuse.zq1.de./tumbleweed/repo/oss/ . During my testing I found that the results are sometimes inappropriately cached on the Cloudflare CDN, so if you used it under this URL without the ipfs client, this might throw signature errors in zypper.

On the server side, the mirror is updated using the syncopensuse script from
https://github.com/bmwiedemann/opensusearchive and consistency of the repo is verified with checkrepo

When a complete repo was synced, dynaname updates a DNS entry to point to the new head:

> host -t txt _dnslink.opensuse.zq1.de.
_dnslink.opensuse.zq1.de is an alias for tumbleweedipfs.d.zq1.de.
tumbleweedipfs.d.zq1.de descriptive text “Last update: 2019-04-03 12:23:43 UTC”
tumbleweedipfs.d.zq1.de descriptive text “dnslink=/ipfs/QmSXEVuU5z23rDxMyFYDhSAUaGRUPswuSXD3aVsBEzucjE”

If you got spare bandwidth and 300 GB disk on some public server, you could also host a mirror of today’s version, simply by doing ipfs pin add QmSXEVuU5z23rDxMyFYDhSAUaGRUPswuSXD3aVsBEzucjE

This is a permalink: http://127.0.0.1:8080/ipfs/QmSXEVuU5z23rDxMyFYDhSAUaGRUPswuSXD3aVsBEzucjE also browsable via any public IPFS gateway. This means, it will always remain on the 20190401 version of Tumbleweed and no changes in content are possible – similar to how a git commit ID always refers to the same data.

So why did I create this IPFS mirror? That is related to my work on reproducible builds for openSUSE. There it regularly happened that published Tumbleweed binaries were built with libraries, compilers and toolchains that were no longer available in current Tumbleweed. This prevented me from verifying that the published binaries were indeed built correctly without manipulation on the OBS build workers.

Now, with this archive of rpms easily available, it was possible to verify many more Tumbleweed packages than before. And most importantly, it remains possible to independently verify even after Tumbleweed moves on to newer versions. This data is going to stay available as long as anyone pins it on a reachable server. I’m going to pin it as long as it remains relevant to me, so probably a bit until after the next full Tumbleweed rebuild – maybe 6 to 12 months.

Thus, it now is even less easy to sneak in binary backdoors during our package build process.

Report from the reproducible builds summit 2018

December 17th, 2018 by

Last week I attended the reproducible builds world summit in Paris.
It was very well organized by Holger, Gunner and their hidden helpers in the background. Very similar to the last 2 summits I attended in Berlin.

Because we were around 50 participants, introductions and announcements were the only things done in the big group. All actual work happened in 5-10 smaller circles.

We had participants from large companies like Google (with bazel), MicroSoft and Huawei, but also from many distributions and open source projects. Even MirageOS as non-Linux OS.

We did knowledge-sharing, refine definitions of terms, evolve concepts like “rebuilders” for verifying builds and allow users to better trust software they install, and such.

I learned about the undocumented DB dump (153 MB) and DB schema

And we had some hacking time, too, so there is now
a jenkins job that renders the list of unreproducible openSUSE Factory packages.

Also, my maintainer tool now has added support for the Alpine Linux distribution, thanks to help by one of its maintainers: Natanael Copa.
This is meant to help all cross-distro collaboration, not just for reproducible builds.

There is still work to be done to make better use of Mitre CPE to map package names across distributions.

I think, one major benefit of the summit was all the networking and talking going on, so that we have an easier time working with each other over the internet in the future.

The issues with contributing to projects only once

June 4th, 2017 by

I work to improve the openSUSE Tumbleweed (GNU/)Linux distribution. Specifically I make sure that all packages can be built twice on different hosts and still produce identical results, which has multiple benefits. This generates a lot of patches in a single week.

OBS
Sometimes it is enough to adjust the .spec file – that is a small text file usually specific to us. Then it is straight-forward

  1. osc bco
  2. cd $PROJECT/$PACKAGE
  3. optional: spec_add_patch $MY.patch $SOME.spec
  4. edit *.spec
  5. osc build
  6. osc vc
  7. osc ci
  8. osc sr

And OBS will even auto-clean the branch when the submit-request is accepted. And it has a ‘tasks’ page to see and track SRs in various stages. For the spec_add_patch to work, you need to do once
ln -s /usr/lib/build/spec_add_patch /usr/local/bin/

When you want to contribute patches upstream, so that other distributions benefit from your improvements as well, then you first need to find out, where they collaborate. A good starting point is the URL field in the .spec file, but a google search for ‘contribute $PROJECT’ often is better.

github
Then there are those many projects hosted on github, where it is also pretty low effort, because I already have the account and it even remains signed in. But some repos on github are only read-only mirrors.

  1. check pull requests, if some have been merged recently
  2. fork the project
  3. git clone git@github.com:…
  4. cd $REPO
  5. edit $FILES
  6. git commit -a
  7. git push
  8. open pull request
  9. maybe have to sign a CLA for the project
  10. When change is accepted, delete fork to not clutter up repository list too much (on github under settings)

sourceforge
The older brother of github. They integrate various ways of contributing. The easiest one is to open a Ticket (Patch or Bug) and attach the .patch you want them to merge with a good description. While many developers do not have the time and energy to debug every bug you file, applying patches is much easier, so gets your issue fixed with a higher chance.

devel Mailinglist
Some projects collaborate mainly through their development MLs, then I need to

  1. subscribe
  2. confirm the subscription
  3. git format-patch origin/master
  4. git send-email –to $FOO-devel@… –from $MYSUBSCRIBEDEMAIL 000*.patch
  5. wait for replies
  6. if it is a high-volume ML, also add an IMAP folder and an entry to .procmailrc
  7. unsubscribe
  8. confirm

project bugtracker
Like https://bugzilla.gnome.org/ https://bugs.python.org/ https://bugs.ruby-lang.org/ https://bz.apache.org/bugzilla/

  1. create unique email addr
  2. sign up for account
  3. add info to my account list
  4. optional: search for existing bug (90% of the time there is none)
  5. file bug
  6. attach patch

So as you can see there is a wide range of ways. And most of them have some initial effort that you would only have once… But then I only contribute once per project, so I always pay that.

Thus, please make it easy for people to contribute one simple fix.

How we run our OpenStack cloud

January 23rd, 2017 by

This post it to document how we setup cloud.suse.de which is one of our many internal SUSE OpenStack Cloud deployments for use by R&D.

In 2016-06 we started the deployment with SOC6 on 4 nodes. 1 controller and 3 compute nodes that also served for ceph (distributed storage) with their 2nd HDD. Since the nodes are from 2012 they only have 1gbit network and spinning disks. Thus ceph only delivers ~50 MB/s which is sufficient for many use cases.

We did not deploy that cloud with HA, even though our product supports it. The two main reasons for that are

  • that it will use up two or three nodes instead of one for controller services, which is significant if you start out with only 4 (and grow to 6)
  • that it increases the complexity of setup, operations and debugging and thus might lead to decreased availability of the cloud service

Then we have a limited supply of vlans even though technically they are just numbers between 1 and 4095, in SUSE we do allocations to be able to switch together networks from further away. So we could not use vlan mode in neutron if we want to allow software defined network (=SDN) (we did not in old.cloud.suse.de and I did not hear complaints, but now I see a lot of people using SDN)
So we went with ovs+vxlan +dvr (open vSwitch + Virtual eXtensible LAN + Distributed Virtual Router) because that allows VMs to remain reachable even when the controller node reboots.
But then I found that they cannot use DNS during that time, because distributed virtual DNS was not yet implemented. And ovs has some annoying bugs are hard to debug and fix. So I built ugly workarounds that mostly hide^Wsolve the problems from our users’ point of view.
For the next cloud deployment, I will try to use linuxbridge+vlan or linuxbridge+vxlan mode.
And the uptime is pretty good. But it could be better with proper monitoring.

Because we needed to redeploy multiple times before we got all the details right and to document the setup, we scripted most of the deployment with qa_crowbarsetup (which is part of our CI) and extra files in https://github.com/SUSE-Cloud/automation/tree/production/scripts/productioncloud. The only part not in there are the passwords.

We use proper SSL certs from our internal SUSE CA.
For that we needed to install that root CA on all involved nodes.

We use kvm, because it is the most advanced and stable of the supported hypervisors. Xen might be a possible 2nd choice. We use two custom kvm patches to fix nested virt on our G3 Opteron CPUs.

Overall we use 3 vlans. One each for admin, public/floating, sdn/storage networks.
We increased the default /24 IP ranges because we needed more IPs in the fixed and public/floating networks

For authentication, we use our internal R&D LDAP server, but since it does not have information about user’s groups, I wrote a perl script to pull that information from the Novell/innerweb LDAP server and export it as json for use by the hybrid_json assignment backend I wrote.

In addition I wrote a cloud-stats.sh to email weekly reports about utilization of the cloud and another script to tell users about which instances they still have, but might have forgotten.

On the cloud user side, we and other people use one or more of

  • Salt-cloud
  • Nova boot
  • salt-ssh
  • terraform
  • heat

to script instance setup and administration.

Overall we are now hosting 70 instance VMs on 5 compute nodes that together have cost us below 20000€

How to build OS images without kiwi

December 26th, 2016 by

kiwi has long been the one standard way of building images in openSUSE, but even though there exist extensive writings on how to use it, for many it is still an arcane thing better left to the Great Magicians.

Thus, I started to use a simpler alternative image building method, named altimagebuild when I built our first working Raspberry Pi images in 2013 and now I re-used that to build x86_64 VM images at
https://build.opensuse.org/package/show/home:bmwiedemann/altimagebuild
after I found out that it even works in OBS, including publishing the result to our mirror infrastructure.
It is still in rpm format because of how it is produced, so you have to use unrpm to get to the image file.

This method uses 3 parts.

  • a .spec file that lists packages to be pulled into the image
  • a mkrootfs.sh that converts the build system into the future root filesystem you want
  • a mkimage.sh that converts the rootfs into a filesystem image

The good thing about it is that you do not need specialized external tools, because everything is hard-coded in the scripts.
And the bad thing about it is that everything is hard-coded in the scripts, so it is hard to share general improvements over a wider range of images.

In the current version, it builds cloud-enabled partitionless images (which is nice for VMs because you can just use resize2fs to get a larger filesystem and if you later want to access your VM’s data from outside, you simply use mount -o loop)
But it can build anything you want.

To make your own build, do osc checkout home:bmwiedemann/altimagebuild && cd $_ && osc build openSUSE_Leap_42.2

So what images would you want to build?

targitter project – about OBS, tars and git

October 5th, 2015 by

In OBS we use source tarballs everywhere to build rpms (and debs) from.

This has at least two major downsides:

  1. Storing all old tar files takes up a lot of disk space
  2. OBS workflows with .tar files and patches are rather different and somewhat disconnected from the git workflows we usually use everywhere else these days. E.g. for the SUSE OpenStack Cloud team we have a “trackupstream” jenkins job, that pulls the latest git version into a tarball once every day.

Fedora already keeps their metadata in git, but only a hash of the tarball.

So as one first step, I used two rather different projects to see how different the space usage would be. On the slow side I used 20 gtk2 tarballs from the last 5 years and on the fast side, I used 31 openstack-nova tarballs from Cloud:OpenStack:Master project from the last 5 months.
I used scripts that uncompressed each tarball, added it to a git repo and used git gc to trigger git’s compression.

Here are the resulting cumulative size graphs:

gtk2 size graph
nova size graph

The raw numbers after 20 tarballs: for nova the ratio is 89772:7344 = 12.2 and for gtk2 the ratio is 296836:53076 = 5.6

What do you think: would it be worth the effort to use more git in our OBS workflows?

Do we care about being able to reproduce the original tarballs? While this is possible, it has some challenges in differing file-ordering, timestamps, file-ownerships and compression levels.
Or would it be enough if OBS converted a tarball into a signed commit (so it cannot be forged without people being able to notice)?

Do you know of a tool that can uncompress tarballs in a way that allows to track the content as single files, and allows to later re-create the original verbatim tarball, such that upstream signatures would still match?

zypper install tab-completion

June 6th, 2015 by

This is a follow-up on my earlier post on zypper tab-completion.

Completion for package-names just made its way into git (thus soon will appear in Factory aka Tumbleweed) after ~6 weeks of back and forth exploring different approaches.

And it is super-fast 🙂

If you do not want to wait, you can use OneClickInstallCLI http://multiymp.zq1.de/zypp:Head/zypper
with allowing some vendor changes for libzypp and libsolv

zypper tab-completion and some thoughts

April 26th, 2015 by

Today I spent some hours implementing nice tab-completion for zypper. There was already a lot done 6 years ago, but the part about installing/removing packages was missing.

Now the thinking part is about the speed. For the tab-completion I needed a list of installed packages and of course we have that in our RPM database (using berkeley DB as a backend). However querying the list with rpm -qa already took over a second on a modern and fast system. On my poor netbook with a cold cache, it took 25 seconds (5 secs on second try with hot cache)… And the point is that you probably do not want to wait 5 seconds for your tab-completion to react.

So to avoid this problem, I used caching via make to produce a better format (plain text). This is then post-processed with sed in a fraction of a second – a speedup factor somewhere between 15 and 150. This makes a big difference.

In the end, I still wonder why plain text is so much faster than a DB. I guess, one reason is that the DB is optimized for retrieval of single values – e.g. rpm -q bash – this is very fast (but even there an egrep “^bash-[^-]+-[^-]+$” is more than twice as fast).

I still want to optimize zypper for better speed, so that a search might some day return in under 2 seconds. One idea for that is to not parse all those config+repo files every time, but only when they change. It could use mmaped files under /var/cache/zypp* as memory to store the binary representations. Though it might become complicated, if dynamic structures such as linked lists are involved.

The future will be interesting…