lean_dojo.data_extraction.traced_data
This module defines traced repos/files/theorems.
- class lean_dojo.data_extraction.traced_data.Comment(start: Pos, end: Pos, text: str)[source]
Bases:
object
A comment in a Lean file.
- text: str
- class lean_dojo.data_extraction.traced_data.TracedFile(root_dir: Path, repo: LeanGitRepo, lean_file: LeanFile, ast: FileNode, comments: List[Comment], traced_repo: TracedRepo | None = None)[source]
Bases:
object
A traced file is a Lean source file annotated with syntactic/semantic information such as tactic states, Lean expressions, and abstract syntax trees (ASTs).
- property abs_path: Path
Absolute path of the
*.lean
file.
- ast: FileNode
Abstract syntax tree (AST) of the entire
*.lean
file.AST nodes are defined in lean_dojo.data_extraction.ast.
- check_sanity() None [source]
Perform some basic sanity checks.
The function raises exceptions in case of unsuccessful checks.
- classmethod from_traced_file(root_dir: str | Path, json_path: Path, repo: LeanGitRepo) TracedFile [source]
Construct a
TracedFile
object by parsing a*.ast.json
file produced bylean --ast --tsast --tspp
(Lean 3) orExtractData.lean
(Lean 4).- Parameters:
root_dir (Union[str, Path]) – Root directory of the traced repo.
json_path (Path) – Path of the
*.ast.json
file relative toroot_dir
.
- classmethod from_xml(root_dir: str | Path, path: str | Path, repo: LeanGitRepo) TracedFile [source]
Load a
TracedFile
object from its*.trace.xml
file.- Parameters:
root_dir (Union[str, Path]) – Root directory of the traced repo.
path (Union[str, Path]) – Path of the
*.trace.xml
file relative toroot_dir
.repo (LeanGitRepo) – The repo to which the traced file belongs.
- get_direct_dependencies(repo: LeanGitRepo) List[Tuple[str, Path]] [source]
Return the names and paths of all modules imported by the current
*.lean
file.
- get_premise_definitions() List[Dict[str, Any]] [source]
Return all theorems and definitions defined in the current file that can be potentially used as premises.
- Returns:
_description_
- Return type:
List[Dict[str, Any]]
- get_traced_theorem(thm_or_name: Theorem | str) TracedTheorem | None [source]
Return a
TracedTheorem
object given anTheorem
object or its fully-qualified name.
- get_traced_theorems() List[TracedTheorem] [source]
Return a list of traced theorem in this traced file.
- property has_prelude: bool
Check whether the file starts with :code:
prelude
.:code:
prelude
instructs Lean NOT to include its built-in library automatically.
- property path: Path
Path of the
*.lean
file relative to the root directory.
- repo: LeanGitRepo
The Lean repo this traced file belongs to.
- root_dir: Path
Root directory (in absolute path) of the corresponding traced repo.
- to_xml() str [source]
Serialize a
TracedFile
object to XML.
- traced_repo: TracedRepo | None = None
The traced repo this traced file belongs to.
Note that
traced_repo
will become None after the traced file is serialized/deserialized on its own.
- traverse_preorder(callback, node_cls: type | None = None)[source]
Traverse the AST in preorder.
- Parameters:
callback (function) – Callback function for visiting AST nodes.
node_cls (Optional[type], optional) – Restrict the application of
callback
to only nodes of typenode_cls
. Defaults to None, which means applyingcallback
to all.
- class lean_dojo.data_extraction.traced_data.TracedRepo(repo: LeanGitRepo, dependencies: Dict[str, LeanGitRepo], root_dir: Path, traced_files: List[TracedFile], traced_files_graph: DiGraph | None)[source]
Bases:
object
A traced repo is a Lean repo of traced files and additional information, such as other repos it depends on, as well as the dependency graph between files.
- check_sanity() None [source]
Perform some basic sanity checks.
The function raises exceptions in case of unsuccessful checks.
- dependencies: Dict[str, LeanGitRepo]
Dictionary mapping the name of each dependency to a
LeanGitRepo
object.
- classmethod from_traced_files(root_dir: str | Path, build_deps: bool = True) TracedRepo [source]
- Construct a
TracedRepo
object by parsing*.ast.json
and*.path
files produced by
lean --ast --tsast --tspp
(Lean 3) orExtractData.lean
(Lean 4).
- Parameters:
root_dir (Union[str, Path]) – Root directory of the traced repo.
build_deps (bool, optional) – Whether to build the dependency graph between files.
- Construct a
- get_traced_file(path: str | Path) TracedFile [source]
Return a traced file by its path.
- get_traced_theorem(thm: Theorem) TracedTheorem | None [source]
Return a
TracedTheorem
object corresponding tothm
- get_traced_theorems() List[TracedTheorem] [source]
Return all traced theorems in the repo.
- classmethod load_from_disk(root_dir: str | Path, build_deps: bool = True) TracedRepo [source]
Load a traced repo from
*.trace.xml
files.
- property name: str
Name of the repo.
- repo: LeanGitRepo
The corresponding Lean repo.
- root_dir: Path
Root directory of the traced repo.
- traced_files: List[TracedFile]
List of traced files in the repo.
- traced_files_graph: DiGraph | None
Dependency graph between files in the repo.
The graph is a DAG, and there is an edge from file
X
to fileY
if and only ifX
importsY
- class lean_dojo.data_extraction.traced_data.TracedTactic(ast: Node, traced_theorem: TracedTheorem | None = None)[source]
Bases:
object
A traced tactic is a tactic annotated with additional information including its AST and the states before/after the tactic.
- get_annotated_tactic() Tuple[str, List[Dict[str, Any]]] [source]
Return the tactic annotated with premise information.
Premises in the tactic are marked by
<a> ... </a>
. For example,rw [add_comm b]
contains a premiseadd_comm
and therefore becomesrw [<a>add_comm</a> b]
. In addition, the function returns the provenance (full name, file path, line/column numbers) of all premises.- Returns:
The first return value is the tactic string marked by
<a> ... </a>
. The second return value is a list of provenances.- Return type:
Tuple[str, List[Dict[str, Any]]]
- property state_after: str
Pretty-printed state after applying the tactic.
- property state_before: str
Pretty-printed state before applying the tactic.
- property tactic: str
The raw tactic string.
- traced_theorem: TracedTheorem | None = None
The traced theorem this tactic belongs to.
- class lean_dojo.data_extraction.traced_data.TracedTheorem(root_dir: Path, theorem: Theorem, ast: CommandTheoremNode | LemmaNode | MathlibTacticLemmaNode, comments: List[Comment], traced_file: TracedFile | None = None)[source]
Bases:
object
A traced theorem is a theorem with additional information such as the AST.
- ast: CommandTheoremNode | LemmaNode | MathlibTacticLemmaNode
AST of the theorem.
- property file_path: Path
The theorem’s file path (relative to the root directory).
- get_premise_full_names() List[str] [source]
Return the fully qualified names of all premises used in the proof.
- get_traced_tactics(atomic_only: bool = False) List[TracedTactic] [source]
Return a list of traced tactics in the proof.
- property is_private: bool
Check if the theorem is private.
- property repo: LeanGitRepo
The Lean repo this theorem belongs to.
- root_dir: Path
Root directory of the corresponding traced repo.
- traced_file: TracedFile | None = None
The traced file this theorem belongs to.
- property traced_repo: TracedRepo | None
The traced repo this theorem belongs to.