Linux Software Packaging Systems - From Tarball to Portage

Linux Software Packaging Systems - From Tarball to Portage
Page content

In The Old Days

Linux has by and large established itself as the successor of the more traditional Unices, whose various reincarnations still exist and operate today. While people today can very easily and transparently install any piece of available software for their Linux system with just a few clicks, things were not as easy for the Unix administrators of old.

Software packaging was basically developed in order to alleviate the system administrator from the burden of having to maintain the system updated with newer versions of software, being either system software (basic system utilities and libraries, involving security patches) or application software that was meant to run on the system (anything from compilers to CAD software).

The Unix administrator of old had to get the software up and running as soon as possible, and at the same time, make sure there were no conflicts with older versions of the software or libraries. Sometimes he even had to make ends meet and “trick” software that was not explicitly designed for the system work properly, without breaking other software or the operating system itself.

How Things Worked

Software was installed by a variety of ways: It came readily built and linked for the system in question: sometimes only the object code was provided and the software had to be linked with the available libraries in the system: and sometimes it had to be compiled and linked from scratch. In any case, it was distributed as a TAR file (Tape ARchive), which is basically a concatenation of all files and directories into a single file, that could be easily distributed across multiple volumes (and back in the day, real wheeltapes).

These TAR files fulfilled the same need as the more widely known ZIP files, and you may have come across them in their gzipped or bzipped forms (.tar.gz and .tar.bz or .tar.bz2 respectively). But there was no standard way of placing files inside a TAR, and Unices differed from place to place, time to time, and admin to admin. So software had to be installed by careful overseeing of an experienced admin who knew the system quirks, where libraries resided, what was buggy and what was not. More importantly, experienced admins could debug the software when it failed or did not work at all.

If one piece of software depended on another piece of software, the admins would track those pieces down and one by one install everything that the particular piece of software needed to work properly. So installing software was a non-trivial and quite technical task.

When Linux Came To Be

This is how things started off with Linux as well. Slackware, the first real distribution of Linux, is (still) based on the tarball: Software is divided in categories, and each category has a list of tarballs that can be installed on a Slackware system. So all one needs to do is pick the software he needs, and the system will be ready to use. With one pitfall: There is no way to check for dependencies. So, inexperienced and novice users don’t have real control over the system since they would risk installing software that doesn’t work because of dependency errors, and they would only find out when they got to keep both pieces of what they broke.

This problem was already known but probably no one had bothered to do anything about it since people that maintained Unix systems were expected to know what they were doing. But when Debian came along, pretty soon after Slackware, it also introduced what is probably still the best packaging solution out there- and what has made Linux software so easy to install: APT (Advanced Packaging Tool).

APT and RPM

The aptly named APT (excuse the pun), is truly a marvel: software is organized into categories, and each software package has certain meta-information attached, and more importantly, on what other software packages it depends. Software package lists are called repositories, and are in essence directories of packages organized in a specific fashion.APT uses this information in order to cross-check all dependencies on the system and make sure nothing is missing, and all this is handled in a manner transparent to the user.

There is another packaging system that tries to do the same, but with varying degrees of success: RPM (Redhat Package Manager).

RPM is used by RedHat and SuSE based distributions, APT is used by Debian and Ubuntu based distributions (though Ubuntu is Debian based, it has spawned itself many other distributions). The main difference between the two is how they solve dependency problems, when and if they arise.

APT is smart enough in most cases to inform the user and provide a way out of the problem. Even with multiple software repositories (listings of software packages), APT manages to keep different versions of the same software in the same system without interfering. In worst case scenarios, removing an offending package will not break the system and make other software unusable.

RPM has a more troubled background. RPM is very sensitive to versions, and not very forgiving of bad packages and user mistakes. The commonly used term ‘RPM hell’ refers to a situation where RPM fails to find or propose a solution for solving a dependency, usually prompting the user to remove all interrelated software, and breaking the system in the process.

That is mainly because the dependency solving is not built-in the RPM system and utilities, but handled differently by each distro.

Portage

In both cases, APT and RPM, software packages are binary compiled and linked against certain versions of libraries, available in other packages. This though limits how a piece of software works in a system, since not all available options or extra functionality might have been compiled by the packager. Some people who wanted more power and freedom for their system, created Portage.

Portage is the packaging system used by Gentoo, based on the BSD Ports system. What Portage does is give the user the capacity to custom fit each software package to his own system and liking, without having to do manually compile and install each software package and dependency. This is accomplished by having the user set some options in a special file (make.conf), which is then consulted by portage each time the user wants to install something (in Gentoo terminology, emerge a package).

Portage works pretty well, but requires users to be somewhat advanced in their understanding of Linux and software compiling before they can tailor their software without breaking their package. Portage also offers a huge selection of packages, even more than the Debian official repositories. Though the issue remains that Portage packages are not as rigorously checked before becoming available as the ones in the (official) Debian repositories.

Autopackage

Another custom solution is the autopackage system. Certain software developers like their projects to be distributed as autopackages. Autopackages are executable files, that unpack the project’s source code and some scripts that handle the dependencies. Autopackages can be installed in a large variety of distributions, independent of the actual package system being used. They are the closest one can have in the Linux world to the setup files of Windows systems.

Autopackages check for dependencies by consulting certain files (most notably pkgconfig files), and if they are not met, they try to download, compile and install the needed dependencies. The depth of this dependency check varies but usually a certain common set of libraries and software is considered available by autopackages, so they are not as fully capable as APT, and they cannot guarantee they will always install without hassle.

The Common Ones

The packaging system is probably one of the most qualitative factors about a Linux distribution. Slackware has remained a distribution aimed at experts and experienced users, with most of the Slackware users maintaining their systems the old Unix way. A nice collection of slackware packages can be found at linuxpackages, as well as slackware.

Redhat and SuSE have largely remained business oriented distributions, where software changes and version updates are kept to a minimum. RPM fits well in such a role, where updating is done only for critical patches and between major version revisions of the distribution, so broken (faulty) packages should be rare and the package system remains relatively simple.

Debian with APT offers a solid, stable package system, that can easily cope with many versions of installed software, and is not overwhelmed by multiple software interdepencies. APT still requires some advanced knowledge when things go bad, but things rarely go bad, as long as users don’t meddle with things they know nothing about. Another possible pitfall is bad repositories with mixed up packages that might mess up a perfectly good installation. Still, if handled properly, especially through an interface like Synaptic, APT works like a charm, and is the ideal package system for a distribution aimed at the regular user.

That’s the main reason why Ubuntu has a Debian heritage.

The Niche Roles

Portage is the package system for the adeventurous sort, people who know (or think they know) what they are doing, and want to live on the edge, building their own packages for their own system but with automated procedures and ease of use and configuration. The problem with Portage is that it is has yet to find a way to validate each and every package for consistency and correctness, so that build failures and erratic behavior from time to time become history. Still it shows a lot of promise, but will always require that the user knows about libraries and what different options a software package can provide.

Autopackage offers a nice solution for software packages that are not widely offered by distributions and at the same time want to be easily available to a multitude of distributions and users, usually having few or very common dependencies.

The Importance

All in all, packaging is what has enabled the explosive expansion of Linux, since it offers a handy and safe way to install software and update the system without really worrying (at least in most cases). Without proper packaging systems like APT and RPM, Linux would still be limited to server administrators and geeks. The multitude of software that is readily available to a Linux user, and the ease with which it can be searched for and installed, is truly to the Linux user’s advantage.

Even though there are more than one package systems, and they are not entirely compatible with one another (though conversion utilities exist), each one is suited to the character and philosophy of each distribution. After all, people use Linux not only because it is free (as in free speech, not free beer), powerful and stable. They also use it because of the so many flavors and unique character each distribution represents. And one of the few consistent things among distributions is there’s always some way to manage software and keep your sanity intact at the same time.