openSUSE Lizards

Posts Tagged ‘jenkins’

Debugging jenkins

July 31st, 2019 by bmwiedemann

We had strange near-daily outages of our internal busy jenkins for some weeks.

To get to the root cause of the issue, we enabled remote debugging with

-Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=9010 -Dcom.sun.management.jmxremote.ssl=false -Djava.rmi.server.hostname=ci.suse.de -Dcom.sun.management.jmxremote.password.file=/var/lib/jenkins/jmxremote.password

and attached visualvm to see what it was doing.
This showed the number of threads and memory usage in a sawtooth pattern. Every time the garbage collector ran, it dropped 500-1000 threads.

Today we noticed that every time it threw these java.lang.OutOfMemoryError: unable to create new native thread errors, the maximum number of threads was 2018… suspiciously close to 2048. Looking for the same time in journalctl showed
kernel: cgroup: fork rejected by pids controller in /system.slice/jenkins.service

So it was systemd refusing java’s request for a new thread and jenkins not handling that gracefully in all cases.
That was easily avoided with a
TasksMax=8192

Now the new peak was at 4890 live threads and jenkins served all Geekos happily ever after.

Tags: CI, jenkins
Posted in Documentation, Infrastructure, Quality Assurance | 1 Comment »

YaST Squad Sprint 62

September 12th, 2018 by Yast Team

Jenkins commenting in GitHub pull requests
Intel Rapid Start Technology for better sleeping
Consistent storage proposal in SLE-12
Partitioner: designing the UI for its full potential
Partitioner: entire disks as members of a software MD RAID
Partitioner: better explanation of unusual conditions
A sample of bug fixes

Improved Jenkins Integration

It happened quite often that our Jenkins job failed for some reason after merging a pull request at GitHub. And because the Jenkins is supposed to submit the changes to the build service it happened that the fixes from Git were not released in the RPM packages if nobody noticed the failure. That was a bit confusing because we closed a bug at Bugzilla but the fix was not available anywhere.

To avoid this we have added a wrapper script which runs the original Jenkins command (rake osc:sr in this case) and writes the result as a comment to the respective pull request at GitHub. If a submit request is created it additionally adds a link to it.

Since now you should keep the pull request page open after merging it and wait for the Jenkins status result. If it fails for some reason then try fixing it or ask for help on IRC or the YaST mailing list.

Note: This automation works only for the code branches which are in active development and for packages which have an Jenkins job assigned (most of the YaST packages have).

Intel Rapid Start Technology Support

The Intel Rapid Start Technology allows to use a fast disk (SSD) for suspend-to-RAM to save energy. The idea is that after a given time the contents of RAM will be moved to SSD so that the system can power itself off. When powered on, RAM will be read back. So it’s something like a dynamic changing of suspend-to-RAM to suspend-to-fast-disk.

What does this technology need from the installer? It needs its own dedicated partition where it can store the contents of RAM. To support this technology we added in this sprint the ability to create and recognize such a partition. It looks like this:

Consistent Storage Proposal (SLE-12-SP4 / yast2-storage-old)

We fixed a bug where the behavior was inconsistent if you switched the storage proposal between partition-based and LVM-based / encrypted LVM-based: bsc#1085169.

The behavior was pretty irritating: Initially, it would propose to create a separate “/home” partition, but when you changed the proposal parameters and simply kept that checkbox “[x] separate /home” checked, it would complain that with the current settings a separate “/home” was not possible.

The two code paths did the calculations differently: One accounted for the other partitions that were also proposed like “swap” and “boot” (or “PrEP” or “/efi-boot”), the other did not. We unified that as much as reasonably possible without breaking things, but since calculating when and how to use any boot partitions is quite complex in that ~~old~~ legacy storage stack, we did not go all the way; boot partitions are pretty small, so their size matters only in very pathological fringe cases. We try not to overengineer things, in particular not with the 4th service pack for a business product.

More details in the pull request with the fix.

The Partitioner looks to the future

We have blogged a lot about Storage-NG and the possibilities and features it will bring to the users. But a significant part of its power remain dormant under the surface because we decided to clone the user interface and the functionality of the classical YaST Partitioner for the deadline marked by SLE 15 (and Leap 15.0). But now we are finally able to start exposing those long-awaited features and to bring new ones for the current Tumbleweed and for the future SLE-15-SP1 and Leap 15.1.

The user interface of the Partitioner is already rather packed with functionality, but we want to avoid a too disruptive redesign. So it was time to some pen and paper sessions, trying to find and draw a good way to add exciting new stuff to the Partitioner, including all of the following:

Allow entire disks (no partition table) to be added as members of a software MD RAID.
Manage (create, modify and delete) partitions within a software MD RAID device.
Make possible to format an entire disk (no partition table) and/or define a mount point for it (just as we do with partitions or LVM logical volumes).
Manage Bcache so the user can set and configure which devices will be used to speed up others.

As usual, we consulted some UI experts in the process and the result is this first version of a document, which summarizes how to incorporate all that to the Partitioner, including some alternatives we are considering for the near future.

That document will become the cornerstone of future developments. Sometimes you need to spend a sprint doing other stuff (like researching and documenting) before you can go ahead with writing code.

Partitioner: full disks as members of a software MD RAID

The first of the features described in the previous document is already available for Tumbleweed users (or it will be as soon as the integration process concludes) and, thus, ready for the upcoming releases of SLE and Leap. Now the Partitioner offers full disks as “Available Devices” in the RAID screen, following the criteria and considerations explained at the document.

That brings even more ways of combining devices together (disks, partitions, software RAIDs, you name it) to create a storage setup. As a result, we decided it was important to explain the situation when some combinations are not possible right away, likely because they need some previous step. Which brings us to…

Partitioner: more specific errors when a device is in use

In general, most of the checks already present in the Partitioner were already able to correctly handle situations in which the disk was a direct member of an MD RAID or an LVM volume group. But the message about the device being in use was not informative enough.

Now the message includes the name of the device that is making the operation impossible (it’s usually one, but there are corner cases in which it can be more than one), so the user has some clue about how to fix it.

Partitioner: improved creation of partition table

One part of the Partitioner that was specially bad at explaining the current situation and the possible consequences was the workflow of “Create New Partition Table”, which also used to exhibit a behavior quite inconsistent with the rest of the Partitioner actions.

In SLE-15 and openSUSE Leap 15.0, the “Create New Partition Table” button immediately presents a form to select the partition table type in case the device supports more than one.

And after the user selects one type it always shows a warning about all kind of devices to be destroyed, no matter if some device is really affected or not.

Even better, if only one partition table type is possible, it still shows the form but with no question. So creating a partition table in a completely empty DASD device result in a misleading warning (nothing is going to be destroyed) on top of an empty wizard.

So the whole action was reimplemented to display the warning only if some devices are indeed going to be affected (including the list of affected devices) and to display that warning as soon as the user clicks the button (as any other Partitioner action).

As seen in the screenshot, the check handles correctly situations in which the disk as a whole (no partitions) is part of an MD RAID or an LVM setup. And, of course, there are no empty wizard steps in the case of DASD or nothing like that. Now the workflow works in the expected way on each situation.

In short, the Storage-NG Partitioner is moving away, step by step, from being a 1:1 clone of the historical Partitioner to offer more functionality and usability. And there are more improvements to come in that area.

Partitioner: Unmounting devices in advance

The Partitioner allows to extensively configure your system storage devices. You can perform a lot of different kind of actions, from changing the label of a file system to creating a complex configuration by using LVM or RAIDs. Each modification you perform is stored in memory, so the real system is not altered at all until you confirm to apply the changes as last step. But in some circumstances, the Partitioner could not be able to perform some of the required actions, and it would fail when trying to modify the real system. One action that sometimes might fail is unmounting a device. This action might fail for several reasons, but the most common is because the file system is busy. And moreover, sometimes there are actions that require the device to be unmounted, for example, for deleting a partition, so the Partitioner would try to automatically unmount it.

During this sprint, the Partitioner has recovered its ability of unmounting devices on the fly to avoid possible failures when applying the changes. Now, if you want to delete a currently mounted device (e.g., a LVM Logical Volume) you will be asked in advance to unmount it. If you accept, the Partitioner will try to unmount the device on the fly without waiting to apply all the changes. In case the unmount action fails, you will be informed about the problem and you might try to manually solve the problem before the Partitioner applies the changes in your system. Of course, you can also continue without unmounting the device and the Partitioner will try to automatically unmount it after accepting all the changes.

Another task that might require unmounting the device is resizing the filesystem. The Partitioner will ask you about unmounting the device when the filesystem does not support resizing while being mounted. And, even when the filesystem does support it, you still might be requested to unmount the device. For example when you want to extend a device by more than 50 GiB. That task might be quite slow and it is highly recommended to unmount the device to speed up the resizing time, otherwise it could take hours.

Bug Fixes

Of course, we continue fighting against bugs. Thus, from this sprint on, alongside other minor stuff, the system

obeys the user and does not keep running the chrony service when they uncheck the “Run NTP as daemon” option in the timezone dialog.
does not crash when the user has no access to the journal logs, displaying a human-readable message, even with a hint!
copies the correct metadata file during the installation, and
limits the size of the registration code, to avoid showing you an ugly and unintelligible error in case of you write there a long paragraph instead ;-P

Tags: github, jenkins, storage
Posted in Factory, Systems Management, YaST | 2 Comments »

Advertisement
Tags
11.3 11.4 12.1 12.2 12.3 13.1 13.2 amd ARM ATI Beta buildservice Build Service C-Language cloud Collaboration Community conference Education event Events Factory fglrx fun GNOME gsoc Hackweek KDE Kernel Kraft Linux LXDE obs openSUSE Package PostgreSQL radeon raspberry Raspberry Pi rpm Ruby Tumbleweed XML xorg YaST
Lizards
- Adrian Schröter (12)
- Agustin Chavarria (6)
- Alessandro de Oliveira Faria (13)
- Alex Barrios (12)
- Alexander Naumov (10)
- Alexander Orlovskyy (3)
- Alin M Elena (5)
- Andrea Florio (27)
- Andreas Jaeger (70)
- Andreas Stieger (12)
- Andrew Wafaa (31)
- Arvin Schnell (9)
- Atri Bhattacharya (3)
- Bernhard Wiedemann (31)
- Bonnie Kurniawan (1)
- Bruno Friedmann (98)
- Calumma Brevicorne (29)
- Carl Fletcher (1)
- Christopher Hobbs (17)
- Ciaran Farrell (3)
- Stephan Kulow (17)
- craig gardner (2)
- Stephan Barth (2)
- Thomas Schmidt (2)
- Dinar Valeev (1)
- Dirk Mueller (2)
- Dmitry Serpokryl (7)
- Efstathios Iosifidis (21)
- Fabio Mucciante (5)
- Federico Lucifredi (9)
- Greg Freemyer (1)
- Holger Sickenberg (2)
- Hubert Mantel (1)
- Ilya Chernykh (5)
- Ismail Donmez (1)
- J. Daniel Schmidt (2)
- James Tremblay (7)
- Jan Blunck (4)
- Jan Loeser (3)
- Jan Madsen (1)
- Jan-Christoph Bornschlegel (3)
- Jan-Simon Möller (20)
- Javier Llorente (12)
- Jigish Gohil (85)
- Jiri Srain (1)
- Jiří Suchomel (3)
- Johan Kotze (5)
- José Oramas M. (6)
- Josef Reidinger (16)
- Juergen Weigert (1)
- Julio Vannini (9)
- Dinar Valeev (5)
- Kevin "Yeaux" Dupuy (11)
- Klaas Freitag (55)
- Lars Vogdt (11)
- Ludwig Nussel (13)
- M. Edwin Zakaria (4)
- Marcus Hüwe (39)
- Marcus Meissner (2)
- Marcus Moeller (3)
- Marcus Schaefer (4)
- Martin Lasarsch (8)
- Martin Mohring (11)
- Masim "Vavai" Sugianto (20)
- Michael Andres (1)
- Michael Löffler (7)
- Michal Marek (7)
- Michal Vyskocil (12)
- Miguel Angel Barajas Hernandez (2)
- P Linnell (2)
- Nelson Marques (55)
- Nenad Latinović (1)
- Nikanth Karthikesan (2)
- Przemyslaw Bojczuk (1)
- Peter Pöml (4)
- Petr Gajdos (2)
- Petr Mladek (60)
- Petr Uzel (5)
- Ray Wang (1)
- Raymond Wooninck (1)
- Ricardo Chung (7)
- Ricardo Varas Santana (7)
- Richard Bos (11)
- Robert Schweikert (16)
- Rossana Motta (1)
- Rupert Horstkötter (10)
- Sascha Manns (66)
- saydul akram (3)
- Sebastian Siebert (6)
- Shawn Dunn (2)
- Stanislav Visnovsky (7)
- Stefan Haas (1)
- Stefan Hundhammer (5)
- Stefan Schubert (7)
- Steffen Winterfeldt (8)
- Suresh Jayaraman (3)
- Susanne Oberhauser (3)
- Thomas Göttlicher (6)
- Thomas Schraitle (26)
- Togan Muftuoglu (3)
- Tuukka Pasanen (36)
- Will Stephenson (22)
- YaST Team (90)
Archives
- March 2020 (1)
- February 2020 (2)
- January 2020 (1)
- December 2019 (3)
- November 2019 (2)
- October 2019 (4)
- September 2019 (3)
- August 2019 (3)
- July 2019 (4)
- June 2019 (2)
- April 2019 (4)
- March 2019 (3)
- February 2019 (5)
- January 2019 (1)
- December 2018 (2)
- November 2018 (2)
- October 2018 (3)
- September 2018 (1)
- August 2018 (3)
- July 2018 (2)
- May 2018 (2)
- April 2018 (2)
- March 2018 (2)
- February 2018 (2)
- January 2018 (2)
- December 2017 (1)
- November 2017 (2)
- October 2017 (2)
- September 2017 (3)
- August 2017 (4)
- July 2017 (4)
- June 2017 (2)
- May 2017 (4)
- April 2017 (2)
- March 2017 (3)
- February 2017 (3)
- January 2017 (2)
- December 2016 (5)
- November 2016 (3)
- October 2016 (6)
- September 2016 (2)
- August 2016 (3)
- July 2016 (4)
- June 2016 (2)
- May 2016 (2)
- April 2016 (1)
- March 2016 (2)
- February 2016 (4)
- January 2016 (4)
- December 2015 (6)
- November 2015 (2)
- October 2015 (3)
- September 2015 (2)
- August 2015 (2)
- July 2015 (2)
- June 2015 (3)
- May 2015 (12)
- April 2015 (7)
- March 2015 (6)
- February 2015 (6)
- January 2015 (7)
- December 2014 (5)
- November 2014 (3)
- October 2014 (5)
- September 2014 (3)
- August 2014 (5)
- July 2014 (5)
- June 2014 (7)
- May 2014 (9)
- April 2014 (2)
- March 2014 (9)
- February 2014 (9)
- January 2014 (10)
- December 2013 (9)
- November 2013 (10)
- October 2013 (10)
- September 2013 (6)
- August 2013 (7)
- July 2013 (3)
- June 2013 (7)
- May 2013 (4)
- April 2013 (4)
- March 2013 (7)
- February 2013 (6)
- January 2013 (3)
- December 2012 (3)
- October 2012 (6)
- September 2012 (6)
- August 2012 (5)
- July 2012 (12)
- June 2012 (6)
- May 2012 (4)
- April 2012 (4)
- March 2012 (5)
- February 2012 (2)
- January 2012 (5)
- December 2011 (10)
- November 2011 (6)
- October 2011 (5)
- September 2011 (9)
- August 2011 (12)
- July 2011 (14)
- June 2011 (11)
- May 2011 (18)
- April 2011 (15)
- March 2011 (26)
- February 2011 (16)
- January 2011 (23)
- December 2010 (27)
- November 2010 (18)
- October 2010 (21)
- September 2010 (16)
- August 2010 (21)
- July 2010 (20)
- June 2010 (33)
- May 2010 (29)
- April 2010 (24)
- March 2010 (29)
- February 2010 (22)
- January 2010 (20)
- December 2009 (15)
- November 2009 (21)
- October 2009 (17)
- September 2009 (22)
- August 2009 (28)
- July 2009 (36)
- June 2009 (38)
- May 2009 (40)
- April 2009 (30)
- March 2009 (20)
- February 2009 (21)
- January 2009 (27)
- December 2008 (23)
- November 2008 (12)
- October 2008 (23)
- September 2008 (40)
- August 2008 (24)
- July 2008 (12)
- June 2008 (28)
- May 2008 (26)
- April 2008 (1)