I got a chance to sit down with Dave Malcolm at FUDCon to talk about the state of Python packaging in Fedora. From the start, Fedora's packages have been designed with a single Python stack in mind at a time. By now the process for packaging Python modules is streamlined and easy to follow, but shortly after Fedora began shipping more than one Python runtime in parallel it became clear that the solutions we have used in the past are now inadequate.
The problem
Fedora's packages, for the most part, assume that the distribution has just one Python stack. They rely on RPM macros like __python, python_sitelib, and python_sitearch to define where files built against the current version of Python should go. When Fedora began shipping both Python 3 and Python 2 stacks in parallel this forced a rewrite of the packaging guidelines for Python that duplicates a number of these macros so packages can build against Python 3. So now we have:
- __python
- __python3
- python_sitelib
- python_sitearch
- python3_sitelib
- python3_sitearch
Packagers need to care about this because packaging a Python module that supports both Python 2 and 3 is now a lot harder. If upstream supports them both with the same tarball then the package-building process has to build the entire module twice - once for each Python stack - and put the resulting files in two different locations. If upstream supports them both with separate tarballs then the Python 3 version needs to go into a separate package altogether, requiring a new package review for each module as well as a non-trivial amount of work to coordinate bugfixes and updates between both packages. Both of these methods result in a large amount of copied-and-pasted code in RPMs' spec files.
In isolation this is not a significant problem. However, this solution will cause significant problems for the distribution over time. As the world increasingly supports Python 3 the number of new package reviews for Python 3-compatible versions of existing modules will increase significantly. When Python 3 finally becomes the default in Fedora, every package with a Python 2 module will need to be edited to build against what will then be an alternate Python stack, and many of them will need to be renamed, and thus re-reviewed, at the same time. (e.g. from python-libfoo to python2-libfoo)
Whether existing Python 3 modules will also need to be renamed remains to be seen.
Packages for other Python runtimes exist as well. EPEL 5 contains a Python 2.6 package that stands alongside the stock 2.4 package. Since modules must be explicitly built against this alternative stack to be of any use to 2.6 users, yet another duplicate set of Python module packages has appeared. What will happen when someone wants to package Python 3 for EPEL 5 or 6? Still more, completely separate, Python runtimes exist in Fedora, such as PyPy and Jython. The packager of a single pure-Python module has to build four packages for the module to work on every Python version that Fedora 14 ships. At worst this means having to maintain and coordinate four independent packages. At best this means one can use the same package for each, but the amount of duplicated code and the number of RPM macros necessary to do it are even worse. The list of macros would now be something like this:
- __python
- __python3
- __pypy
- __jython
- python_sitelib
- python_sitearch
- python3_sitelib
- python3_sitearch
- pypy_sitelib
- jython_sitelib
This method of packaging Python modules in a distribution that ships multiple Python stacks is unsustainable. Not only does it create a significant amount of additional work and red tape for packagers, but it also adds even more work for Fedora's already-over-burdened package reviewers.
How can we do this better?
Dave came up with a proof-of-concept tool last year that attempts to mitigate the pain of having a single source package build for multiple Python runtimes. It lets the distribution define which Python runtimes it includes and then fills in what would have otherwise been repeated spec file code when RPMs are built. While it is arguably over-engineered, I feel that it is the right way to approach this sort of problem. By giving the distribution a way to define what Python runtimes packages ought to build against, it avoids the need to hard-code a distribution-specific list of them in every package's spec file. With direct support from RPM or RPM macros, packagers could apply this sort of solution to any number of program stacks where multiple versions are likely to appear, such as PHP or Drupal.
Dave's experiment received precious little feedback on Fedora's Python mailing list. I encourage Fedora's packagers to take a look (or another) at it and see what they can learn from it. Perhaps we can use it as the basis of a way to make multi-stack packaging easier before the Python flood hits.
Disclaimer: this is my rant, but Dave's solution. Please direct any and all flames toward me (gholms), not him. ;-)