Limitations and Caveats
Limitations
LeanDojo has the following limitations. Addressing them won’t be our priority in the near future, but we welcome contributions:
LeanDojo cannot extract data from the lean4 repo itself nor interact with theorems in it.
Currently, LeanDojo cannot process Lean repos that use FFI, e.g., [LeanCopilot](https://github.com/lean-dojo/LeanCopilot).
LeanDojo does not support term-based proofs or proofs that mixes tactics and terms.
Entering all tactic-style proofs in mathlib 3 to LeanDojo, we found ~1.4% of them are misjudged as incorrect. The errors fall into a few categories documented in test_unexpected_errors.py. We didn’t perform this analysis on Lean 4.
Theorems extracted by LeanDojo are “syntactic theorems”, i.e., they are Lean constants defined using keywords
theorem
orlemma
. First, they are not guaranteed to be real theorems (Lean constants of typeProp
). Second, theorems defined in other ways (e.g., usingdef
) are not extracted.Tracing mathlib 3 produces buggy
IdentNode
. Their names start withuser__
. So you can search forname="user__
in the generated*.trace.xml
files. We haven’t figured out the exact reason underlying this bug, but it’s likely to be related to Lean’s parser and mathlib’s alias command.We require Lean 3 >= v3.42.1. Otherwise, the modification patch cannot be applied.
We have problems tracing a few premises related to
quot
in Lean 3. See the warnings in this Jupyter notebook.Lean 3 ASTs generated by
lean --ast --tsast --tspp
have a small number of errors.
Caveats
When LeanDojo is killed, it may leave temporary files in the system’s temporary directory (or the directory specified by the
TMP_DIR
environment variable). It may be necessary to manually clean up these files occasionally.