An Overview of Packaging for Python

As a general-purpose programming language, Python is designed to be used in many ways. You can build web sites or industrial robots or a game for your friends to play, and much more, all using the same core technology.

Python’s flexibility is why the first step in every Python project must be to think about the project’s audience and the corresponding environment where the project will run. It might seem strange to think about packaging before writing code, but this process does wonders for avoiding headaches later on.

This overview provides a general-purpose decision tree for reasoning about Python’s plethora of packaging options. Read on to choose the best technology for your next project.

Thinking about deployment

Packages exist to be installed (or deployed), so before you package anything, you’ll want to have some answers to the deployment questions below:

  • Who are your software’s users? Will your software be installed by other developers doing software development, operations people in a datacenter, or a less software-savvy group?
  • Is your software intended to run on servers, desktops, or embedded in dedicated devices?
  • Is your software installed individually, or in large deployment batches?

Packaging is all about target environment and deployment experience. There are many answers to the questions above and each combination of circumstances has its own solutions. With this information, the following overview will guide you to the packaging technologies best suited to your project.

Packaging libraries and tools

You may have heard about PyPI, setup.py, and wheel files. These are just a few of the tools Python’s ecosystem provides for distributing Python code to developers, which you can read about in Packaging and distributing projects.

The following approaches to packaging are meant for libraries and tools used by technical audience in a development setting. If you’re looking for ways to package Python for a non-technical audience and/or a production setting, skip ahead to Packaging Applications.

Python modules

A Python file, provided it only relies on the standard library, can be redistributed and reused. You will also need to ensure it’s written for the right version of Python.

Python source distributions

If your code consists of multiple Python files, it’s usually organized into a directory structure. Any directory containing Python files, provided one of those files is named __init__.py, comprises an import package.

Because packages consist of multiple files, they are harder to distribute. Most protocols support transferring only one file at a time (when was the last time you clicked a link and it downloaded multiple files?). It’s easier to get incomplete transfers, and harder to guarantee code integrity at the destination.

So long as your code contains nothing but pure Python code, and you know your deployment environment supports your version of Python, then you can use Python’s native packaging tools to create a source distribution package, or sdist for short.

Python’s sdists are compressed archives (.tar.gz files) containing one or more packages or modules. If your code is pure-Python, and you only depend on other Python packages, you can go here to learn more.

If you rely on any non-Python code, or non-Python packages (such as libxml2 in the case of lxml, or BLAS libraries in the case of numpy), you will want to read on.

Note

Python and PyPI support multiple distributions providing different implementations of the same package. For instance the unmaintained-but-seminal PIL distribution provides the PIL package, and so does Pillow, an actively-maintained fork of PIL!

This Python packaging superpower makes it possible for Pillow to be a drop-in replacement for PIL, just by changing your project’s q`install_requires` or requirements.txt.

Python binary distributions

So much of Python’s practical power comes from its ability to integrate with the software ecosystem, in particular libraries written in C, C++, Fortran, Rust, and other languages.

Not all developers have the right tools or experiences to build these components written in these compiled languages, so Python created the wheel, a package format designed to ship libraries with compiled artifacts. In fact, Python’s package installer, pip, always prefers wheels because installation is always faster.

Binary distributions are best when they come with source distributions to match. Even if you don’t upload wheels of your code for every operating system, by uploading the sdist, you’re enabling users of other platforms to still build it for themselves.

Python and PyPI make it easy to upload both wheels and sdists together. Just follow the Packaging Python Projects tutorial.

A summary of Python's packaging capabilities for tools and libraries.

Python’s recommended built-in library and tool packaging technologies. Excerpted from The Packaging Gradient (2017).

Packaging Applications

So far we’ve only discussed Python’s native distribution tools. Based on our introduction, you would be correct to infer these built-in approaches only target environments which have Python, and an audience audience who knows how to install Python packages.

With the variety of operating systems, configurations, and people out there, this assumption is only safe when targeting a developer audience.

Python’s native packaging is mostly built for distributing reusable code, called libraries, between developers. We can piggyback tools, or basic applications for developers, on top of Python’s library packaging, using technologies like setuptools entry_points.

Libraries are building blocks, not complete applications. For distributing applications, there’s a whole new world of technologies out there.

The best way to organize these application packaging options is by the way they depend on the target environment. That’s how we’ll approach the coming sections.

Depending on a framework

Some types of Python applications, like web sites and services, are common enough that they have frameworks to enable their development and packaging. Other types of applications, like web and mobile clients, are advanced enough that a framework becomes more than a convenience.

In all these cases, it makes sense to work backwards, from the framework’s packaging and deployment story. Some frameworks include a deployment system which wraps the technologies outlined in the rest of the guide. In these cases, you’ll want to defer to your framework’s packaging guide for the easiest and most reliable production experience.

If you ever wonder how these platforms and frameworks work under the hood, you can always read the sections beyond.

Service platforms

If you’re developing for a “Platform-as-a-Service” or “PaaS” like Heroku or Google App Engine, you are going to want to follow their respective packaging guides.

In all these setups, the platform takes care of packaging and deployment, as long as you follow their patterns. Most software does not fit one of these templates, hence the existence of all the other options below.

If you’re developing software that will be deployed to machines you own, users’ personal computers, or any other arrangement, read on.

Web browsers and mobile applications

Python’s steady advances are leading it into new spaces. These days you can write a mobile app or web application frontend in Python. While the language may be familiar, the packaging and deployment practices are brand new.

If you’re planning on releasing to these new frontiers, you’ll want to check out the following frameworks, and refer to their packaging guides:

If you are not interested in using a framework or platform, or just wonder about some of the technologies and techniques utilized by the frameworks above, continue reading below.

Depending on a pre-installed Python

Pick an arbitrary computer, and depending the context, there’s a very good chance Python is already installed. Included by default in most Linux and Mac operating systems for many years now, you can reasonably depend on Python preexisting in your data centers or on the personal machines of developers and data scientists.

Technologies which support this model:

  • PEX (Python EXecutable)
  • zipapp (does not help manage dependencies, requires Python 3.5+)
  • shiv (requires Python 3)

Note

Of all the approaches here, depending on a pre-installed Python relies the most on the target environment. Of course, this also makes for the smallest package, as small as single-digit megabytes, or even kilobytes.

In general, decreasing the dependency on the target system increases the size of our package, so the solutions here are roughly arranged by increasing size of output.

Depending on a new Python ecosystem

For a long time many operating systems, including Mac and Windows, lacked built-in package management. Only recently did these OSes gain so-called “app stores”, but even those focus on consumer applications and offer little for developers.

Developers long sought remedies, and in this struggle, emerged with new their own package management solutions – with some notable benefits for Python developers in particular. The most prominent, an alternative package ecosystem called Anaconda is built around Python and is increasingly common in academic, analytical, and other data-oriented environments, even making its way into server-oriented environments.

Instructions on building for the Anaconda ecosystem:

A similar model involves installing an alternative Python distribution, but does not support arbitrary operating system-level packages:

Bringing your own Python executable

Computing as we know it is defined by the ability to execute programs. Every operating system natively supports one or more formats of program they can natively execute.

There are many techniques and technologies which turn your Python program into one of these formats, most of which involve embedding the Python interpreter and any other dependencies into a single executable file.

This approach, called freezing, offers wide compatiblity and seamless user experience, though often requires multiple technologies, and a good amount of effort.

A selection of Python freezers:

Most of the above imply single-user deployments. For multi-component server applications, see Chef Omnibus.

Bringing your own userspace

An increasing number of operating systems – including Linux, Mac OS, and Windows – can be set up to run applications packaged as lightweight images, using a relatively modern arrangement often referred to as operating-system-level virtualization, or containerization for short.

As this level is packaging whole OS filesystems, techniques are mostly Python agnostic.

Adoption is most extensive among Linux servers, where the technology originated and where the technologies below work best:

Bringing your own kernel

Most operating systems support some form of classical virtualization, running applications packaged as images containing a full operating system of their own. Running these virtual machines, or VMs, is a mature approach, widespread in data center environments.

These techniques are mostly reserved for larger scale deployments in data centers, though certain complex applications can benefit from this packaging. Technologies are Python agnostic, and include:

Bringing your own hardware

The most all-encompassing way to ship your software would be to ship it already-installed on some hardware. This way, your software’s user would require only electricity.

Whereas the virtual machines described above are primarily reserved for the tech-savvy, hardware appliances used by the most advanced data centers to the youngest children.

Embed your code on an Adafruit, MicroPython, or more-powerful hardware running Python, then ship it to the datacenter or your users’ homes. They plug and play, and you can call it a day.

A summary of technologies used to package Python applications.

The simplified gamut of technologies used to package Python applications.

What about…

The sections above can only summarize so much, and you might be wondering about some of the more conspicuous gaps.

Operating system packages

As mentioned in Depending on a new Python ecosystem above, some operating have package managers of their own. If you’re very sure of the operating system you’re targeting, you can depend directly on a format like deb (for Debian, Ubuntu, etc.) or RPM (for Red Hat, Fedora, etc.), and use that built-in package manager to take care of installation, and even deployment.

In most deployment pipelines, the OS package manager is just one piece of the puzzle.

virtualenv

Virtualenvs have been an indispensible tool for multiple generations of Python developer, but are slowly fading from view, as they are being wrapped by higher-level tools. With packaging in particular, virtualenvs are used as a primitive in the dh-virtualenv tool and osnap, both of which wrap virtualenvs in a self-contained way.

For production deployments, do not rely on running pip install from the Internet into a virtualenv, as one might do in a development environment. The overview above is full of much better solutions.

Security

The further down the gradient you come, the harder it gets to update components of your package. Everything is more tightly bound together.

For example, if a kernel security issue emerges, and you’re deploying containers, the host system’s kernel can be updated without requiring a new build on behalf of the application. If you deploy VM images, you’ll need a new build. Whether or not this dynamic makes one option more secure is still a bit of an old debate, going back to the still-unsettled matter of static versus dynamic linking.

Wrap up

Packaging in Python has a bit of a reputation for being a bumpy ride. This impression is mostly a byproduct of Python’s versatility. Once you understand the natural boundaries between each packaging solution, you begin to realize that the varied landscape is a small price Python programmers pay for using the most balanced, flexible language available.