In OBS we use source tarballs everywhere to build rpms (and debs) from.
This has at least two major downsides:
- Storing all old tar files takes up a lot of disk space
- OBS workflows with .tar files and patches are rather different and somewhat disconnected from the git workflows we usually use everywhere else these days. E.g. for the SUSE OpenStack Cloud team we have a “trackupstream” jenkins job, that pulls the latest git version into a tarball once every day.
Fedora already keeps their metadata in git, but only a hash of the tarball.
So as one first step, I used two rather different projects to see how different the space usage would be. On the slow side I used 20 gtk2 tarballs from the last 5 years and on the fast side, I used 31 openstack-nova tarballs from Cloud:OpenStack:Master project from the last 5 months.
I used scripts that uncompressed each tarball, added it to a git repo and used git gc to trigger git’s compression.
Here are the resulting cumulative size graphs:
The raw numbers after 20 tarballs: for nova the ratio is 89772:7344 = 12.2 and for gtk2 the ratio is 296836:53076 = 5.6
What do you think: would it be worth the effort to use more git in our OBS workflows?
Do we care about being able to reproduce the original tarballs? While this is possible, it has some challenges in differing file-ordering, timestamps, file-ownerships and compression levels.
Or would it be enough if OBS converted a tarball into a signed commit (so it cannot be forged without people being able to notice)?
Do you know of a tool that can uncompress tarballs in a way that allows to track the content as single files, and allows to later re-create the original verbatim tarball, such that upstream signatures would still match?