Developer Guide

This page includes information that are hidden from most users but useful if you want to contribute to LeanDojo. It requires additional dependencies installed via pip install "lean-dojo[all]".

Contribution Guideline

We welcome and appreciate contributions from the community. For bug fixes and relatively minor changes (comments, typos, etc.), feel free to submit a pull request directly. For anything more substantial, please first reach out to us before implementing. All contributions should conform to our code formatting standards (see Code Formatting).

Implementation Notes

Modified Lean 3

Lean 3’s AST export mechanism (lean --ast --tsast --tspp) provides rich syntactic information about *.lean files, as well as certain semantic information such as tactic states. However, it does not provide everything we need for retrieval-augmented theorem proving. For each constant (premise) in a tactic, we’d like to know:

  • Its start/end positions within the tactic.

  • Its fully qualified name.

  • Where to find its definition (*.lean file and the position within the file).

The first one is syntactic and is already provided by lean --ast --tsast --tspp. The remaining two are produced by Lean’s elaborator but not exported. Therefore, we modify Lean to export them. Our modification is implemented as a patch , which can be applied to Lean from v3.42.1 up to the most recent version by git apply (see lean_dojo.data_extraction.trace). Note that:

  • The modification only changes Lean’s frontend but not its core functions for proof checking. Therefore, we do not compromise Lean’s soundness.

  • The modification is only for data extraction. For interaction, we still use the original, unmodified Lean.


We use pytest for testing. You can run tests by:

VERBOSE=1 CACHE_DIR=~/.cache/lean_dojo_testing DISABLE_REMOTE_CACHE=1 CONTAINER=docker pytest -s tests
rm -rf ~/.cache/lean_dojo_testing

The environment variable CACHE_DIR makes sure the testing uses a temporary cache directory that does not interfere with the deployed code. The temporary cache directory is deleted after the testing completes. DISABLE_REMOTE_CACHE=1 instructs repos to be built locally. Note that running all tests can take almost 1 day on 32 CPUs.

Building the Documentation

This documentation is generated by Sphinx. You can build it by:

cd docs && make clean && make html && cd ..

Building the Package and Uploading to PyPI

You probably won’t need it, but this is how to build the LeanDojo package and upload it to PyPI:

hatch build
hatch publish

Static Type Checking

The source code of LeanDojo uses Python type annotations extensively. You can perform static type checking using mypy by

mypy src/lean_dojo

Currently there are still many type errors. Contributions to fix them are welcome.

Code Formatting

LeanDojo’s code is formatted by Black. We use Github Actions to ensure all modifications are formatted.