lean_dojo.data_extraction.traced_data
This module defines traced repos/files/theorems.
- class lean_dojo.data_extraction.traced_data.Comment(start: Pos, end: Pos, text: str)[source]
Bases:
objectA comment in a Lean file.
- text: str
- class lean_dojo.data_extraction.traced_data.TracedFile(root_dir: Path, repo: LeanGitRepo, lean_file: LeanFile, ast: FileNode, comments: List[Comment], traced_repo: TracedRepo | None = None)[source]
Bases:
objectA traced file is a Lean source file annotated with syntactic/semantic information such as tactic states, Lean expressions, and abstract syntax trees (ASTs).
- property abs_path: Path
Absolute path of the
*.leanfile.
- ast: FileNode
Abstract syntax tree (AST) of the entire
*.leanfile.AST nodes are defined in lean_dojo.data_extraction.ast.
- check_sanity() None[source]
Perform some basic sanity checks.
The function raises exceptions in case of unsuccessful checks.
- classmethod from_traced_file(root_dir: str | Path, json_path: Path, repo: LeanGitRepo) TracedFile[source]
Construct a
TracedFileobject by parsing a*.ast.jsonfile produced bylean --ast --tsast --tspp(Lean 3) orExtractData.lean(Lean 4).- Parameters:
root_dir (Union[str, Path]) – Root directory of the traced repo.
json_path (Path) – Path of the
*.ast.jsonfile relative toroot_dir.
- classmethod from_xml(root_dir: str | Path, path: str | Path, repo: LeanGitRepo) TracedFile[source]
Load a
TracedFileobject from its*.trace.xmlfile.- Parameters:
root_dir (Union[str, Path]) – Root directory of the traced repo.
path (Union[str, Path]) – Path of the
*.trace.xmlfile relative toroot_dir.repo (LeanGitRepo) – The repo to which the traced file belongs.
- get_direct_dependencies(repo: LeanGitRepo) List[Tuple[str, Path]][source]
Return the names and paths of all modules imported by the current
*.leanfile.
- get_premise_definitions() List[Dict[str, Any]][source]
Return all theorems and definitions defined in the current file that can be potentially used as premises, including the premises in the theorem statement and premises in the tactics used to prove the theorem.
- Returns:
_description_
- Return type:
List[Dict[str, Any]]
- get_traced_theorem(thm_or_name: Theorem | str) TracedTheorem | None[source]
Return a
TracedTheoremobject given anTheoremobject or its fully-qualified name.
- get_traced_theorems() List[TracedTheorem][source]
Return a list of traced theorem in this traced file.
- property has_prelude: bool
Check whether the file starts with :code:
prelude.:code:
preludeinstructs Lean NOT to include its built-in library automatically.
- property path: Path
Path of the
*.leanfile relative to the root directory.
- repo: LeanGitRepo
The Lean repo this traced file belongs to.
- root_dir: Path
Root directory (in absolute path) of the corresponding traced repo.
- to_xml() str[source]
Serialize a
TracedFileobject to XML.
- traced_repo: TracedRepo | None = None
The traced repo this traced file belongs to.
Note that
traced_repowill become None after the traced file is serialized/deserialized on its own.
- traverse_preorder(callback, node_cls: type | None = None)[source]
Traverse the AST in preorder.
- Parameters:
callback (function) – Callback function for visiting AST nodes.
node_cls (Optional[type], optional) – Restrict the application of
callbackto only nodes of typenode_cls. Defaults to None, which means applyingcallbackto all.
- class lean_dojo.data_extraction.traced_data.TracedRepo(repo: LeanGitRepo, dependencies: Dict[str, LeanGitRepo], root_dir: Path, traced_files: List[TracedFile], traced_files_graph: DiGraph | None)[source]
Bases:
objectA traced repo is a Lean repo of traced files and additional information, such as other repos it depends on, as well as the dependency graph between files.
- check_sanity() None[source]
Perform some basic sanity checks.
The function raises exceptions in case of unsuccessful checks.
- dependencies: Dict[str, LeanGitRepo]
Dictionary mapping the name of each dependency to a
LeanGitRepoobject.
- classmethod from_traced_files(root_dir: str | Path, build_deps: bool = True) TracedRepo[source]
- Construct a
TracedRepoobject by parsing*.ast.jsonand*.pathfiles produced by
lean --ast --tsast --tspp(Lean 3) orExtractData.lean(Lean 4).
- Parameters:
root_dir (Union[str, Path]) – Root directory of the traced repo.
build_deps (bool, optional) – Whether to build the dependency graph between files.
- Construct a
- get_traced_file(path: str | Path) TracedFile[source]
Return a traced file by its path.
- get_traced_theorem(thm: Theorem) TracedTheorem | None[source]
Return a
TracedTheoremobject corresponding tothm
- get_traced_theorems() List[TracedTheorem][source]
Return all traced theorems in the repo.
- classmethod load_from_disk(root_dir: str | Path, build_deps: bool = True) TracedRepo[source]
Load a traced repo from
*.trace.xmlfiles.
- property name: str
Name of the repo.
- repo: LeanGitRepo
The corresponding Lean repo.
- root_dir: Path
Root directory of the traced repo.
- traced_files: List[TracedFile]
List of traced files in the repo.
- traced_files_graph: DiGraph | None
Dependency graph between files in the repo.
The graph is a DAG, and there is an edge from file
Xto fileYif and only ifXimportsY
- class lean_dojo.data_extraction.traced_data.TracedTactic(ast: Node, traced_theorem: TracedTheorem | None = None)[source]
Bases:
objectA traced tactic is a tactic annotated with additional information including its AST and the states before/after the tactic.
- get_annotated_tactic() Tuple[str, List[Dict[str, Any]]][source]
Return the tactic annotated with premise information.
Premises in the tactic are marked by
<a> ... </a>. For example,rw [add_comm b]contains a premiseadd_command therefore becomesrw [<a>add_comm</a> b]. In addition, the function returns the provenance (full name, file path, line/column numbers) of all premises.- Returns:
The first return value is the tactic string marked by
<a> ... </a>. The second return value is a list of provenances.- Return type:
Tuple[str, List[Dict[str, Any]]]
- property state_after: str
Pretty-printed state after applying the tactic.
- property state_before: str
Pretty-printed state before applying the tactic.
- property tactic: str
The raw tactic string.
- traced_theorem: TracedTheorem | None = None
The traced theorem this tactic belongs to.
- class lean_dojo.data_extraction.traced_data.TracedTheorem(root_dir: Path, theorem: Theorem, ast: CommandTheoremNode | LemmaNode | MathlibTacticLemmaNode, comments: List[Comment], traced_file: TracedFile | None = None)[source]
Bases:
objectA traced theorem is a theorem with additional information such as the AST.
- ast: CommandTheoremNode | LemmaNode | MathlibTacticLemmaNode
AST of the theorem.
- property file_path: Path
The theorem’s file path (relative to the root directory).
- get_premise_full_names() List[str][source]
Return the fully qualified names of all premises used in the proof.
- get_traced_tactics(atomic_only: bool = False) List[TracedTactic][source]
Return a list of traced tactics in the proof.
- property is_private: bool
Check if the theorem is private.
- property repo: LeanGitRepo
The Lean repo this theorem belongs to.
- root_dir: Path
Root directory of the corresponding traced repo.
- traced_file: TracedFile | None = None
The traced file this theorem belongs to.
- property traced_repo: TracedRepo | None
The traced repo this theorem belongs to.