Create Python Packages

Why should you care about creating packages?

packages are easy to install (pip install demo).
packages simplify development (pip install -e . installs your package and keeps it up-to-date during development).
packages are easy to run and test (from demo.main import say_hello, then test the function).
packages are easy to version, avoiding breaking dependant’s code (pip install demo==1.0.3).

Library vs Package vs Module:

Module: a .py file containing functions that belong together
Package: a collection of modules that is distributable
Library: a package that is a not context aware

Packaging python code is quite easy, you require a single setup.py script to package in several distribution formats.

1. Packaging Setup

Let’s use the folder structure from an earlier post, and create a virtual environment in it:

➜ tree -a -L 2
.
├── .venv
│   └── ...
├── Pipfile
├── Pipfile.lock
├── src
│   └── demo
│       └── main.py
└── tests
    └── demo
        └── ...

9 directories, 3 files

Create a setup.py file in the root directory that we will use to define the way we’d like to package our code, containing the following code:

"""Setup.py script for packaging project."""

from setuptools import setup, find_packages

import json
import os


def read_pipenv_dependencies(fname):
    """Get default dependencies from Pipfile.lock."""
    filepath = os.path.join(os.path.dirname(__file__), fname)
    with open(filepath) as lockfile:
        lockjson = json.load(lockfile)
        return [dependency for dependency in lockjson.get('default')]


if __name__ == '__main__':
    setup(
        name='demo',
        version=os.getenv('PACKAGE_VERSION', '0.0.dev0'),
        package_dir={'': 'src'},
        packages=find_packages('src', include=[
            'demo*'
        ]),
        description='A demo package.',
        install_requires=[
            *read_pipenv_dependencies('Pipfile.lock'),
        ]
    )

You can now call this script to package your code in several ways:

python setup.py develop # don't generate anything, just install locally
python setup.py bdist_egg # generate egg distribution, doesn't include dependencies
python setup.py bdist_wheel # generate versioned wheel, includes dependencies
python setup.py sdist --formats=zip,gztar,bztar,ztar,tar # source code

Run the first one in the list above. When it succeeds, you will be able to import your code as follows:

from demo.main import say_hello

Note: if you are receiving “No module named demo…”, you’ll need to add an empty __init__.py file in all folders you want to import from. In our example, that only includes the demo folder. You can read more about these __init__.py files here.

Now that we were able to install the project, we should take a closer look at the arguments that we pass to the setuptools.setup function:

name: the name of your package
version: every change to your code should yield a different package version, or else developers may install the same package version that’s suddenly behaving differently, breaking their code
packages: a list of paths of all your python files
install_requires: a list of package names and versions (just as in a requirements.txt file)

You can see that I wrote a simple function read_pipenv_dependencies to read the non-dev dependencies from the Pipfile.lock. Now I won’t have to specify dependencies manually. I also use os.getenv to read in an environment variable to determine the package version, which are nice sagues to the next topics.

2. Documentation

Just as I read in the Pipfile.lock to specify my dependencies, I can also read in a README.md file to display useful documentation as the long_description. More information about this can be read on packaging.python.org.

In addition you can create a proper documentation web page using readthedocs and shinx. Create a folder for your documentation:

mkdir docs

Install sphinx:

pipenv install -d sphinx

Run the quickstart to generate the source directory for your documentation:

sphinx-quickstart

Now you can start populating the docs/index.rst file with your documentation. Learn more about automating this process on sphinx-doc.org.

3. Linting and Testing

As part of your packaging process, you’d want to apply some static code analyses, linting, and testing.

pipenv install -d mypy autopep8 \
  flake8 pytest bandit pydocstyle

Preferably, you’d run a command to verify the code style, run some tests and checks before you push your commits to the remote repository, and cause a build pipeline to fail if the tests don’t pass.

4. Makefile

As we’re rapidly introducing new commands that are part of packaging our specific project, it is useful to record common commands. Most build automation tools (such as Gradle or npm) provide this feature by default.

Make is a tool to organize code compilation, traditionally used in c-oriented projects. But it can be used to run any other command.

By default, when you run make, it executes the first command in the list, in the example below it will execute make help and print out the contents of the Makefile.

If we run make test, it will first run make dev, as it’s stated as a dependency in the Makefile:

help:
	@echo "Tasks in \033[1;32mdemo\033[0m:"
	@cat Makefile

lint:
	mypy src --ignore-missing-imports
	flake8 src --ignore=$(shell cat .flakeignore)

dev:
	pip install -e .

test: dev
	pytest --doctest-modules --junitxml=junit/test-results.xml
	bandit -r src -f xml -o junit/security.xml || true

build: clean
	pip install wheel
	python setup.py bdist_wheel

clean:
	@rm -rf .pytest_cache/ .mypy_cache/ junit/ build/ dist/
	@find . -not -path './.venv*' -path '*/__pycache__*' -delete
	@find . -not -path './.venv*' -path '*/*.egg-info*' -delete

Now, as you can see, it’s quite easy for new developers to contribute to the project. They now have a nice overview of common commands, for example to build a wheel: make build.

5. Installing wheel

When you run make build, it will use the setup.py file to create a wheel distribution. You’ll find a .whl file in the dist/ folder, having 0.0.dev0 in the name. You can now specify an environment variable to change the version of the wheel:

export PACKAGE_VERSION='1.0.0'
make build
ls dist

Having the wheel, you can create a new folder, somewhere on your desktop, copy the wheel into it, and install it using:

mkdir test-whl && cd test-whl
pipenv shell
pip install *.whl

Print out the installed packages:

pip list

6. Include Config Files

It’s also possible to add data to your package by adding the following lines to your setup.py script:

Note: This may not work on distributed systems (such as Databricks).

if __name__ == '__main__':
    setup(
        data_files=[
            ('data', ['data/my-config.json'])
        ]
    )

You’ll then be able to read the file by using this function:

def get_cfg_file(filename: str, foldername: str) -> dict:
    """Get config file.

    Using 'data_files' property from setup.py script.
    """
    if not isinstance(foldername, str):
        raise ValueError('Foldername must be string.')
    if foldername[0] == '/':
        raise ValueError('Foldername must not start with \'/\'')
    if not isinstance(filename, str):
        raise ValueError('Filename must be string.')

    # Will first try to read file from installed location
    # this only applies for .whl installations
    # Otherwise it will read file directly
    try:
        filepath = os.path.join(sys.prefix, foldername, filename)
        with open(filepath) as f:
            return json.load(f)
    except FileNotFoundError:
        filepath = os.path.join(foldername, filename)
        with open(filepath) as f:
            return json.load(f)

If you create a wheel again, and install it in a virtual environment in a new folder, without copying the data file, you should be able to access the data by executing the function above.

7. DevOps

As part of our packaging process, we want to integrate changes from many contributors, and automate as many repetitive processes that are required to successfully release a new version.

For this example we’ll use Azure DevOps, where the following pipeline will be triggered on git tags as well as the master branch.

Have a look, and we’ll discuss the various stages and tasks afterwards:

resources:
  - repo: self

trigger:
  - master
  - refs/tags/v*

variables:
  python.version: "3.7"
  project: demo
  feed: demo
  major_minor: $[format('{0:yy}.{0:MM}', pipeline.startTime)]
  counter_unique_key: $[format('{0}.demo', variables.major_minor)]
  patch: $[counter(variables.counter_unique_key, 0)]
  fallback_tag: $(major_minor).dev$(patch)

stages:
  - stage: Test
    jobs:
      - job: Test
        displayName: Test
        steps:
          - task: UsePythonVersion@0
            displayName: "Use Python $(python.version)"
            inputs:
              versionSpec: "$(python.version)"

          - script: pip install pipenv && pipenv install -d --system --deploy --ignore-pipfile
            displayName: "Install dependencies"

          - script: pip install typed_ast && make lint
            displayName: Lint

          - script: pip install pathlib2 && make test
            displayName: Test

          - task: PublishTestResults@2
            displayName: "Publish Test Results junit/*"
            condition: always()
            inputs:
              testResultsFiles: "junit/*"
              testRunTitle: "Python $(python.version)"

  - stage: Build
    dependsOn: Test
    jobs:
      - job: Build
        displayName: Build
        steps:
          - task: UsePythonVersion@0
            displayName: "Use Python $(python.version)"
            inputs:
              versionSpec: "$(python.version)"

          - script: "pip install wheel twine"
            displayName: "Wheel and Twine"

          - script: |
              # Get version from git tag (v1.0.0) -> (1.0.0)
              git_tag=`git describe --abbrev=0 --tags | cut -d'v' -f 2`
              echo "##vso[task.setvariable variable=git_tag]$git_tag"              
            displayName: Set GIT_TAG variable if tag is pushed
            condition: contains(variables['Build.SourceBranch'], 'refs/tags/v')

          - script: |
              # Get variables that are shared across jobs
              GIT_TAG=$(git_tag)
              FALLBACK_TAG=$(fallback_tag)
              echo GIT TAG: $GIT_TAG, FALLBACK_TAG: $FALLBACK_TAG

              # Export variable so python can access it
              export PACKAGE_VERSION=${GIT_TAG:-${FALLBACK_TAG:-default}}
              echo Version used in setup.py: $PACKAGE_VERSION

              # Use PACKAGE_VERSION in setup()
              python setup.py bdist_wheel              
            displayName: Build

          - task: CopyFiles@2
            displayName: Copy dist files
            inputs:
              sourceFolder: dist/
              contents: demo*.whl
              targetFolder: $(Build.ArtifactStagingDirectory)
              flattenFolders: true

          - task: PublishBuildArtifacts@1
            displayName: PublishArtifact
            inputs:
              pathtoPublish: $(Build.ArtifactStagingDirectory)
              ArtifactName: demo.whl

          - task: TwineAuthenticate@1
            inputs:
              artifactFeed: $(project)/$(feed)

          - script: |
              twine upload -r $(feed) --config-file $(PYPIRC_PATH) dist/*              
            displayName: PublishFeed

In the Test stage we install the project in the pipeline container, without creating a virtual environment. We then run the make lint and make test commands, just like you would on your machine.

In the Build stage we will try to extract the package version from a git tag, and we construct a fallback package version. We run the python setup.py bdist_wheel command to build a wheel, knowing that our package version environment variable is set. Finally, we publish the artifact to Azure DevOps artifacts, and (optionally) to our feed.

You’ll need a .pypirc file to publish your package to a feed, you can copy the contents after creating a feed in Azure DevOps, which looks something like this:

[distutils]
Index-servers =
  stefanschenk

[stefanschenk]
Repository = https://pkgs.dev.azure.com/stefanschenk/_packaging/stefanschenk/pypi/upload

On how to install packages from a private feed, have a look at this post.

Howto