Our team blog this week is written by Stephan ‘coolo’ Kulow and he talks about work in the team done on the Open Build Service.
For years one of the biggest complaint about the webUI was that it is impossible to find packages. The search ability has been part of the interface from the beginning, but with over 200.000 packages being build today it is crucial to get the right package.
Where is my kernel?
Especially for developers new to openSUSE and the build service it is common to have to search for the package to fix for a specific bug. So you find yourself looking for kernel in the webUI and you are prompted with tons of results that are displayed in a rather random order and the notion that your search resulted in more than 200 hits and is basically invalid. Huh? home:foobar:latest-experiments:kernel is surely not the openSUSE kernel to fix, but then what is?
Now if you ask google about “kernel site:build.opensuse.org” you get closer to the problem at hand: “About 16,800 results” – that is a lot to pick the first 20 results to display from. The OBS webUI tried to find a good pick with an algorithm that might have been clever when build.opensuse.org had 100 projects. Today, it can only be called old and useless.
Ancor for world fame
So I tricked Ancor to look into the problem by claiming he would get all the praises in the OBS world for implementing a sane search.
The problem is far from trivial, but there are good tools to get a better result than what we had now and Ancor has a lot of experience with these (and Rails in general). So it seemed like he could attain a great balance between effort and outcome.
But as always the devil lay in the details, so this post is also about getting feedback about the actual result.
What we did
Ancor integrated Thinking Sphinx into the OBS, so the name, title and description can be combined with other attributes into one big index that allows page ranking.
Additionally there is no limit of 200 results anymore, the webUI will display all results now, but only 20 at a time as you might have seen in larger sites offering search results display…
We collected attributes which are most likely relevant for people searching for packages. For example, we gather the linkcount of a package into the database (so far only the backend knows what is a link and what is a plain package). The idea is to move links down in the source results.
Coming back to the kernel example, the kernel-source package is the real package, while kernel-default, kernel-desktop, kernel-xen, …. exist too but are all links to kernel-source. So it is fair to present kernel-source first.
Problem is: there are still 228 kernel-source packages in the build service (yes, people like branching the kernel – a lot), so the number of links pointing to the package is another attribute. Packages that other packages branch go up in the list while the resulting branches move down. What also plays a role in the calculation: is a package the devel package of another? (which is the final punch to have Kernel:HEAD/kernel-source as first result displayed, as opposed to the old searching algorithm displaying a discontinued “linux-kernel-nutshell” as shown in the screen shots).
To sort the vast majority of results that are all _links, not linked to and not devel packages, we take the activity index. This is a number the OBS tracks for every package, but is nowhere displayed. It goes from 0 to 100 and goes down with time and goes up with regular commits. So if you look for kde, you will actually see KDE:Unstable:Playground as first project to match. This is because of two things:
- kde is indeed a very bad search term
- the unstable playground sees a lot of commits, so your chance of getting something fresh there is the highest
Your feedback wanted
Of course nobody is perfect, and while the code of Ancor is close to, the weights given to the attributes were my choice, so all problems in the sorting you see are my fault. Please take some time and redo some searches you might have done in the past and report if the results are sane to your experienced eye. Within the HTML of the search results is a hidden span with the raw attributes used in the calculation, so if you find something strange, look for weight, linked_count, activity_index, is_devel and co. Possibly the package that looks bogus to you in the top results is just very active.
Depending on the feedback we get, we might need to change the weights or add yet more attributes in the search and ranking. Do your own experiments on build.opensuse.org today!
And as always, we finish with the top-ten contributors to openSUSE Factory of last week!
|6||Lars Müller, Ismail Donmez, Cristian Rodríguez, Bjørn Lie|
|7||Stefan Dirsch, Matthias Mailänder, Jan Engelhardt, Dmitry Roshchin|
|9||Ludwig Nussel, Dirk Mueller|
|10||Raymond Wooninck, Lukas Ocilka|
Both comments and pings are currently closed.