Description of the miniproject

Introduction

The miniproject package is a tiny stupid package that demonstrates the build process for the mmgroup package. The mmgroup package is a python implementation of Conway’s construction [Con85] of the monster group \(\mathbb{M}\), which is the largest sporadic finite simple group.

That package contains C programs that have been generated automatically by a progam code generator. Here automatically generated low-level functions written in C are used to compute rather large tables, which will be use by high-level C programs. This means that the build process has to take place in several stages. Also, there are C subroutines which are used by both, low-level and high-level functions. So it makes sense to place these subroutines into a shared library.

When porting the mmgroup package to another operating system, several aspects specific to the operating system have to be considered. This miniproject package can be considered as a model for the mmgroup package focussing on the os-specific aspects.

Installation

The current version of the miniproject package is a source distribution that has been tested on a 64-bit Windows platform only. It runs with python 3.6 or higher. The distribution contains a number of extensions written in C which have to be built before use. At present there is no binary distribution for Windows.

Before you can use this package or build its extensions you should install the following python packages:

External Python packages required

Package

Purpose

cython

Development: integrating C programs into the mminiproject package

numpy, scipy

Recommendation: These packages are not really needed here, but they should be installed, since they are required for the mmgroup package.

pytest

Testing: basic package used for testing

setuptools

Development: basic package used for setup and building extensions

sphinx

Documentation: basic package used for documentation

sphinxcontrib-bibtex

Documentation: bibliography in BibTeX style

Packages used for the purpose of documentation are required only if you want to rebuild the documentation. If you want to rebuild the documentation you should also install the following programs:

External programs required

Program

Purpose

Location

miktex

Documentation

https://miktex.org/

Perl

Documentation

https://www.perl.org/get.html

Building the extension

To build the required extension, go to root directory of the distribution. The is the directory containing the files setup.py and README.rst. From there run the following two commands:

python setup.py build_ext
pytest src/miniproject/ -v

Installing a C compiler for cython in Windows

The bad news for Windows developers is that there is no pre-installed C compiler on a standard Windows system. However, the Cython package requires a C compiler. Here in principle, the user has the choice between the following two compilers:

  • MSVC

  • MinGW-w64

The user has to install a C compiler so that it cooperates with cython. That installation process is out of the scope of this document.

For installing MinGW, one might start looking at https://cython.readthedocs.io/en/latest/src/tutorial/appendix.html.

For installing MSVC, one might start looking at https://wiki.python.org/moin/WindowsCompilers

The current setup.py supports MinGW and MSVC for 64-bit Windows. According to the last URL the MinGW compiler works with all Python versions up to 3.4.

The author has installed the MSVC compiler with the Microsoft Build Tools for Visual Studio from:

https://visualstudio.microsoft.com/thank-you-downloading-visual-studio/?sku=BuildTools&rel=16 ,

following the instructions in

https://www.scivision.dev/python-windows-visual-c-14-required/ .

Before typing python setup.py bdist_wheel in a Windows command line the author had to type:

"C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\VC\Auxiliary\Build\vcvars64.bat"

Here the path my be different on the user’s Windows system.

Application interface

miniproject.wrapper.wrap_double(k)

Python wrapper of a C function that doubles an integer k.

miniproject.wrapper.wrap_triple(k)

Python wrapper of a C function that triples an integer k.

The function returns 3 * k.

For small values k the function reads the result from a table which has been generated by a python script. For large values k it uses a function from a shared library.

Both, the table and the shared library have been created in a prior step of the build process.

The modified build process

The necessary classes for modifying the build process given by the setuptools/distutils package is code in the build_ext_steps.py script.

Module build_ext_steps provides a customized version of the ‘build_ext’ command for setup.py.

It should be placed in the same directory as the module setup.py.

Distributing python packages

The standard toolkit for distributing python packages is the setuptools package. Here the user types:

python setup.py build_ext

at the console for building the extensions to the python package, which are typically written in a language like C or C++ for the sake of speed. We may use e.g. the Cython package to write python wrappers for the functions written in C or C++. The setuptools package supports the integration of Cython extensions.

The setup.py script describes a list of extensions, where each extension is an instance of class Extension which is provided by setuptools. Then it calls the setup function which builds all these extensions:

from setuptools import setup
from setuptools.extension import Extension   

ext_modules = [
    Extension(
        ...  # description of first extension
    ),
    Extension(
        ...  # description of second extension
    ),
    ...
]

setup(
    ..., # Standard arguments for the setup function
    ext_modules = ext_modules,   # List of extensions to be built
)

We assume that the reader is familiar with the standard python setup process. For background, we refer to

https://setuptools.readthedocs.io/en/latest/

A new paradigm for building python packages

This build_ext_steps module supports a new paradigm for building a python extension:

  • A python program make_stage1.py creates a C program stage1.c.

  • We create a python extension stage1.so (or stage1.pyd in Windows) that makes the functionality of stage1.c available in python.

  • A python program make_stage2.py creates a C program stage2.c. Here make_stage2.py may import stage1.so (or stage1.pyd).

  • We create a python extension that makes the functionality of stage2.c available in python.

  • etc.

This paradigm is not supported by the setuptools package.

Using class BuildExtCmd for the new paradigm

For using the new building paradigm we have to replace the standard class build_ext by the class build_ext_steps.BuildExtCmd.

from setuptools import setup
from build_ext_steps import Extension   
from build_ext_steps import BuildExtCmd

ext_modules = [
    # description of extension as above
]

setup(
    ..., # Standard arguments for the setup function
    ext_modules = ext_modules,   # List of extensions to be built
    cmdclass={
       'build_ext': BuildExtCmd, # replace class for build_ext
    },
)

This change has a few consequences:

  • It is guaranteed that the extension are build in the given order

  • Extensions are always build in place (option build_ext --inplace) (The current version does not support building the extension in a special build directory.)

  • The building of all extensions is now forced (option build_ext --f), regardless of any time stamps.

  • A keyword argument extra_compile_args and extra_link_args for an instance of class Extension may be a dictionary

    ‘compiler’ : <List if arguments>

    instead of a list of arguments. Here ‘compiler’ is a string describing a compiler. For a list of compilers, run

    python setup.py build_ext --help-compiler .

Apart from these changes, an extension is created in the same way as with setuptools.

For a documentation of the Extension class in the setuptools package, see

https://docs.python.org/3/distutils/apiref.html?highlight=extension#distutils.core.Extension

Inserting user-defined functions into the build process

Module build_ext_steps provides a class CustomBuildStep for adding user-defined functions to the build process.

In the list ext_modules of extensions, instances of class CustomBuildStep may be mixed with instances of class Extension, Class CustomBuildStep models an arbitrary sequence of functions to be executed.

The constructor for that class takes a string ‘name’ describing the action of these functions, followed by an arbitrary number of lists, where each list describes a function to be executed.

Here the first entry of each list is either a string or a callable python function. If the first entry is a string then a subprocess with that name is called. Otherwise the corresponding python function is executed. Subsequent entries in the list are arguments given to the subprocess or to the function.

Such a subprocess may be e.g. a step that generates C code to be used for building a subsequent python extension.

Its recommended to use the string sys.executable (provided by the sys package) instead of the string 'python' for starting a python subprocess.

Building shared libraries

Module build_ext_steps provides another class SharedExtension for building a shared library (or a DLL in Windows).

In the list ext_modules of extensions, instances of class SharedExtension may be mixed with instances of other classes.

Arguments for the constructor of class SharedExtension are the same as for class Extension. Especially, the following keyword arguments are recognized:

  • name, sources, include_dirs, libraries, define_macros, undef_macros

Here name is a Python dotted name without any extension as in class Extension. The command uses the C compiler to build the shared library and stores it at the location given by name in the same way as a python extension is built. The appropriate extension (e.g. ‘.so’ for unix and ‘.dll’ for Windows) is automatically appended to the file name of the shared library.

The user should provide an additional keyword argument implib_dir specifying a directory where to store the import library for a Windows DLL. In Windows, the import library for foo.dll has the name libfoo.lib. If a program uses a Windows DLL then it should be linked to that import library. In unix operating systems there is no concept similar to an import library.

Caution!!

The current class SharedExtension supports Windows DLLs containing C programs (no C++) compiled with the mingw32 compiler only.

Using a shared library in an extension

The reason for building a shared library is that several python extensions may use the same shared library.

The way how a shared library (or a Windows DLL) is linked to a program using that library depends on the operating system.

The user may have to perform some os-specific steps for making the library available for python extension. Therefore he may read the variable os.name (which has value 'nt' for Windows and 'posix' for unix) and use class CustomBuildStep for performing the appropriate steps.

For Windows one has to add the directory of the import library (discussed in the previous section) to the search path for the library by specifying that directory in the 'library_dirs' keyword argument for class Extension. Also, one has to specify the name of the import library in the 'libraries' keyword argument for class Extension. This is sufficient if the python extension and the DLLs used by that extension are in the same directory.

More involved cases of shared libraries and the procedures required for other operating systems are out of the scope of this documentation.

Using an extension in a subsequent build step

Once a python extension has been built, it can also be used in a subsequent step of the build process, e.g. for calculating large arrays of constants for C programs.

This approach works well on a Windows system, but it might not work on other operating systems. Here it is a good idea to write a pure-python substitute for any C extension to be used in a subsequent build step. This may slow down the build process considerably. But it is better to have a slow build process than no build process at all.

Technical remarks about shared libraries

The functionality for building a Windows DLL with the migw32 compiler in coded in function make_dll_win32_gcc. One could have used the functionality of class distutils.ccompiler instead, but the author has decided not to dive any deeper into the source code of the distutils package. It may be worth using class distutils.ccompiler for porting the functionality of classes BuildExtCmd and SharedExtension to other compilers and operating systems.

The mingw32 compiler is not the standard compiler for python on Windows. C++ files should be compiled with the standard compiler (which is msvc for Windows) in order to avoid trouble with name mangling.

The build process for the miniproject package

The miniproject package contains a function double_function written in C and stored in a shared library. That function doubles an integer value. There is a python wrapper (written in Cython) which makes that function available in python. The pytest package is used to test that function.

The miniproject package also contains a function triple_function witten in C and stored in another shared library. Again, there is a python wrapper (written in Cython) for that function so that it can also be tested with pytest.

Function triple_function calls function double_function, which is in a different shared library. For small values it uses a precomputed table triple_table.c for tripling a number. That table is precomputed by the python script codegen.py. Of course, the C program triple_table.c must be integrated into the process that builds the python wrapper for function triple_function.

So the functionality provided by this package is trivial, but the building process for this package is extremely involved. That build process is based on the standard setuptools/distutils package. As usual, there is a standard script setup.py in the root directory of the package controls the build process.

In the current version the necessary modifications of the setuptools/distutils package required for a multi-step build process are contain in the build_ext_steps.py script in the root directory.

Porting the miniproject and the mmgroup package

The current versions of the miniproject and mmgroup packages support the 64-bit Windows operating system with the mingw32 complier only.

The author is aware of the fact that porting a package as complex as the mmgroup package to a different operating system, or even adjusting it to a different compiler, may be a highly frustrating job. Here the miniproject can be used as a model for the mmgroup package.

It is highly recommended to port the miniproject to any operating system before porting the mmgroup package.

After porting the miniproject the file build_ext_steps.py (and all new files created for the porting process) should be copied from the root directory of the miniproject to the corresponding directory of the mmgroup package.

References

Con85

J. H. Conway. A simple construction of the Fischer-Griess monster group. Inventiones Mathematicae, 1985.