Packaging Step by Step#

In this lesson, we will go through packaging workflow step by step, this time, we will also go over the recommended structure of a Python package and the metadata that are required to make an installable “distribution package”. This will allow you to create a package that can be shared with others easily and installed using pip.

Packaging workflow#

For your reference, here is a high-level overview of the workflow again that we will be following in this lesson:

Simple packaging

In the previous lesson, we’ve gone all the way to step 3. However, we will be starting from scratch in this lesson and go through all the steps again, this time following the best practices for packaging.

Step 1: Write and run code in Jupyter notebooks#

This is already done for us. We have a Jupyter notebook with some code that we want to package, called sample.ipynb from previous lesson.

Package goals

For simplicity, let’s define a few goals for our package:

  1. Install our package using pip

  2. Run our code from the terminal using a command eggsample

Step 2: Extract Python code and create modules#

We will extract the Python code from the Jupyter notebook and create a 2 Python modules:

  1. eggsellent_cook.py - contains the bulk of the code, especially the EggsellentCook class.

  2. command_line_interface.py - contains the command line interface (CLI) code since we’re going to create a command line tool eggsample.

These code files might look like the following:

Once you have created the two modules, you can test them by running the following command in your terminal:

python -c "from command_line_interface import main; main()"

This should print out a message with the ingredients and condiments for the food. Essentially, we’ve imported the main function in command_line_interface module, and then run it, all from the terminal.

Note

If you’re unfamiliar with the python -c command, it allows you to run Python code from the terminal.

Step 3: Place Python modules to designated package directory#

In the previous lesson, we simply placed the Python modules in a directory called eggsample. It works, but it’s not the best practice, and we saw that it was really hard to share the package with others, so let’s do it the right way this time.

There are two different layouts that you will commonly see within the Python packaging ecosystem: src and flat layouts. we will be using the src/ layout for creating your Python package. This layout is recommended in the PyPA packaging guide.

Okay, let’s create the following directory structure, placing the Python modules in the src/eggsample directory:

eggsample_repo
├── src                               ┐
│   └── eggsample                     │
│       ├── __init__.py               │ Package source code
│       ├── command_line_interface.py │
│       └── eggsellent_cook.py        ┘

At this point, we essentially have the same directory structure as before, but this time we have followed the best practice layout for Python packages. We now have a repository directory eggsample_repo with a src/eggsample directory containing the Python modules.

We’re now ready to move on to the next step.

Step 4: Add Python package metadata#

In order to make our package installable using pip, we need to add some metadata to our package.

Currently if you try to pip install the repository directory eggsample_repo, you will get an error:

pip install ./eggsample_repo

Error

ERROR: Directory './eggsample_repo' is not installable. Neither 'setup.py' nor 'pyproject.toml' found.

As you can see, pip is looking for either a setup.py or pyproject.toml file in the repository directory. We will be using the pyproject.toml file to store the metadata for our package, as it is the modern way to store metadata for Python packages. See below about the pyproject.toml file.

About the pyproject.toml file#

Every modern Python package should include a pyproject.toml file. If your project is pure Python and you’re using a setup.py or setup.cfg file to describe its metadata, you should consider migrating your metadata and build information to a pyproject.toml file.

If your project isn’t pure-python, you might still require a setup.py file to build the non Python extensions. However, a pyproject.toml file should still be used to store your project’s metadata.

What happened to setup.py & how do i migrate to pyproject.toml?

Prior to August 2017, Python package metadata was stored either in the setup.py file or a setup.cfg file. In recent years, there has been a shift to storing Python package metadata in a much more user-readable pyproject.toml format. Having all metadata in a single file:

  • simplifies package management,

  • allows you to use a suite of different build backends such as (flit-core, hatchling, pdm-build), and

  • aligns with modern best practices.

Source: https://www.pyopensci.org/python-package-guide/package-structure-code/pyproject-toml-python-package-metadata.html#about-the-pyproject-toml-file

Create a pyproject.toml file#

Let’s create a pyproject.toml file in the repository directory eggsample_repo with the following content:

File: pyproject.toml

[build-system]
requires = ["hatchling"]
build-backend = "hatchling.build"

[project]
name = "eggsample"
version = "0.1.0"
description = "The eggsample from an eggsellent cook"
requires-python = ">= 3.10"

The above pyproject.toml file contains the following tables:

  • [build-system] table - It allows you to declare which build backend you use and which other dependencies are needed to build your project.

    • build-backend: In our case, we’ve set our build-backend to be hatchling, which is the current recommended build backend for Python packages as it is extensible, standards compliant, and easy to use.

    • requires: In order to use hatchling, we’d need to specify this in the requires field. To ensure that the build frontend knows to install this dependency before building the package

  • [project] table - It contains the metadata for your project, such as the name, version, and description.

    • name: The name of the package, in our case, eggsample.

    • version: The version of the package, in our case, 0.1.0.

    • description: A short description of the package.

    • requires-python: The Python version required to run the package. In our case, >= 3.10, which means that the package requires Python 3.10 or later.

    For an extensive list of fields that can be included in the [project] table, see the Project Core Metadata. We will be adding more metadata to this file in later steps and lessons.

At this point, you should have the following directory structure:

eggsample_repo
├── pyproject.toml              ] Package metadata and build configuration
└── src
    └── eggsample
        ├── __init__.py
        ├── command_line_interface.py
        └── eggsellent_cook.py

Step 5: Install the package#

Now that we have added the metadata to our package, we can install it using pip:

pip install -e ./eggsample_repo

The -e flag is used to install the package in “editable” mode, which means that any changes you make to the package will be reflected in the installed package without having to reinstall it. This is useful during development when you are actively working on the package. For a regular installation, you can omit the -e flag.

Note

pip is an example of a build frontend that allows us to “install” our package using the build backend that we’ve specified in the pyproject.toml file.

Another build frontend that we will use for package distribution is build, which is a is a PEP 517 compatible Python package builder. It provides a CLI to build packages, as well as a Python API.

Once you’ve installed the package, you should be able to import the eggsample package in Python:

import eggsample

That seems to have worked! We’ve successfully installed our package using pip.

We can also see the installed package using the following command:

pip list | grep eggsample

This results in the following output: eggsample  0.1.0. This confirms that our package has been installed successfully and is available for use, with version 0.1.0.

Step 6: Create eggsample command line tool#

We have officially gone through the packaging workflow and have successfully installed our package using pip. Congratulations! 🎉

We still haven’t fully met our package goals, which were to run our code from the terminal using a command eggsample.

Run the main function in the terminal#

First, let’s try to run the main function again from the command_line_interface module, but this time, we will import from eggsample.command_line_interface:

python -c "from eggsample.command_line_interface import main; main()"

Error

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/.../.../eggsample/command_line_interface.py", line 1, in <module>
    from eggsellent_cook import EggsellentCook
ModuleNotFoundError: No module named 'eggsellent_cook'

Oh no! We’re getting a ModuleNotFoundError for eggsellent_cook. I thought we had everything set up correctly and that the package was installed successfully. This is a common issue that arises when you’re working with packages and modules in this way. Essentially, the eggsellent_cook module is not being found because there is no eggsellent_cook package or module anywhere in the Python path.

There is however an eggsample.eggsellent_cook module, which is the correct path to the module. So let’s replace the import statement in command_line_interface.py with the correct path:

File: command_line_interface.py

# Old import statement
# from eggsellent_cook import EggsellentCook

# New import statement
from eggsample.eggsellent_cook import EggsellentCook

Now, let’s try running the main function again:

python -c "from eggsample.command_line_interface import main; main()"
Your food. Enjoy some egg, egg, wonderous spam, salt, magnificent spam, splendiferous spam, egg, lovely spam, pepper, egg, egg
Some condiments? We have pickled walnuts, steak sauce, mushy peas, mint sauce
Now this is what I call a condiments tray!

Create scripts sub-table in pyproject.toml#

We have successfully run the main function from the terminal. However, we still haven’t met our package goals, which were to run our code from the terminal using a command eggsample. To call the long command for python -c is not user-friendly and not the best way to run a command line tool.

Within the [project] table in the pyproject.toml file, we can add a [project.scripts] sub-table to specify the command line tools that we want to create.

Let’s add the following to the pyproject.toml file:

File: pyproject.toml

[project.scripts]
eggsample = "eggsample.command_line_interface:main"

Once you’ve added the [project.scripts] sub-table to the pyproject.toml file, you need to re-install the package again using pip since we’ve made changes to the metadata. The rule of thumb is that whenever you make changes to the metadata, you need to re-install the package.:

pip install -e ./eggsample_repo

Now, you should be able to run the eggsample command from the terminal:

eggsample
Your food. Enjoy some magnificent spam, pepper, salt, egg, egg, egg, wonderous spam, splendiferous spam, lovely spam, egg, egg
Some condiments? We have pickled walnuts, steak sauce, mushy peas, mint sauce
Now this is what I call a condiments tray!

Step 7: Share your package#

Congratulations! You’ve successfully created a Python package that can be shared with others easily and installed using pip. To share your package with others at this stage, you can push your repository to GitHub (see optional instruction below called Creating and Pushing to GitHub), and then others can install your package using the following command:

pip install git+https://github.com/lsetiawan/simple_eggsample.git

The above example points to a GitHub repository simple_eggsample that we’ve created from the steps above. You can replace this with your own GitHub repository URL or your neighbor’s repository URL if you’re feeling generous.

Notice that we’re using the git+ protocol to install the package from a Git repository. This is a common way to install packages from a Git repository using pip and is useful when you want to install a package that is not available on the Python Package Index (PyPI), however it does require you to have git already installed in your system to work. See this blog for a quick overview of the feature of pip install git+.

Summary#

Let’s summarize what we’ve done in this lesson:

  • We’ve extracted the Python code from the Jupyter notebook and created two Python modules: eggsellent_cook.py and command_line_interface.py.

  • We’ve placed the Python modules in the src/eggsample directory, following the best practice layout for Python packages.

  • We’ve added metadata to our package by creating a pyproject.toml file in the repository directory eggsample_repo.

  • We’ve installed the package using pip and verified that it was installed successfully.

  • We’ve created a command line tool eggsample that runs the main function from the command_line_interface module.

At the very core, now you’ve learned how to package your Python code into a Python package that can be shared with others easily and installed using pip. This is a significant milestone in your Python journey, and you should be proud of yourself for reaching this point. In the next lesson we will go over adding more supplemental files to your package, so that you can have a more complete package that can be shared and published for community use.