Limitations and Caveats

Limitations

LeanDojo has the following limitations. Addressing them won’t be our priority in the near future, but we welcome contributions:

  • LeanDojo cannot extract data from the lean4 repo itself nor interact with theorems in it.

  • Currently, LeanDojo cannot process Lean repos that use FFI, e.g., [LeanCopilot](https://github.com/lean-dojo/LeanCopilot).

  • LeanDojo does not support term-based proofs or proofs that mixes tactics and terms.

  • Entering all tactic-style proofs in mathlib 3 to LeanDojo, we found ~1.4% of them are misjudged as incorrect. The errors fall into a few categories documented in test_unexpected_errors.py. We didn’t perform this analysis on Lean 4.

  • Theorems extracted by LeanDojo are “syntactic theorems”, i.e., they are Lean constants defined using keywords theorem or lemma. First, they are not guaranteed to be real theorems (Lean constants of type Prop). Second, theorems defined in other ways (e.g., using def) are not extracted.

  • Tracing mathlib 3 produces buggy IdentNode. Their names start with user__. So you can search for name="user__ in the generated *.trace.xml files. We haven’t figured out the exact reason underlying this bug, but it’s likely to be related to Lean’s parser and mathlib’s alias command.

  • We require Lean 3 >= v3.42.1. Otherwise, the modification patch cannot be applied.

  • We have problems tracing a few premises related to quot in Lean 3. See the warnings in this Jupyter notebook.

  • Lean 3 ASTs generated by lean --ast --tsast --tspp have a small number of errors.

Caveats

  • When LeanDojo is killed, it may leave temporary files in the system’s temporary directory (or the directory specified by the TMP_DIR environment variable). It may be necessary to manually clean up these files occasionally.