lean_dojo.data_extraction.traced_data

This module defines traced repos/files/theorems.

class lean_dojo.data_extraction.traced_data.Comment(start: Pos, end: Pos, text: str)[source]

Bases: object

A comment in a Lean file.

end: Pos
classmethod from_xml(tree: Element) Comment[source]
start: Pos
text: str
to_xml(parent: Element) None[source]
class lean_dojo.data_extraction.traced_data.TracedFile(root_dir: Path, repo: LeanGitRepo, lean_file: LeanFile, ast: FileNode, comments: List[Comment], traced_repo: TracedRepo | None = None)[source]

Bases: object

A traced file is a Lean source file annotated with syntactic/semantic information such as tactic states, Lean expressions, and abstract syntax trees (ASTs).

property abs_path: Path

Absolute path of the *.lean file.

ast: FileNode

Abstract syntax tree (AST) of the entire *.lean file.

AST nodes are defined in lean_dojo.data_extraction.ast.

check_sanity() None[source]

Perform some basic sanity checks.

The function raises exceptions in case of unsuccessful checks.

comments: List[Comment]

All comments in the *.lean file.

classmethod from_traced_file(root_dir: str | Path, json_path: Path, repo: LeanGitRepo) TracedFile[source]

Construct a TracedFile object by parsing a *.ast.json file produced by lean --ast --tsast --tspp (Lean 3) or ExtractData.lean (Lean 4).

Parameters:
  • root_dir (Union[str, Path]) – Root directory of the traced repo.

  • json_path (Path) – Path of the *.ast.json file relative to root_dir.

classmethod from_xml(root_dir: str | Path, path: str | Path, repo: LeanGitRepo) TracedFile[source]

Load a TracedFile object from its *.trace.xml file.

Parameters:
  • root_dir (Union[str, Path]) – Root directory of the traced repo.

  • path (Union[str, Path]) – Path of the *.trace.xml file relative to root_dir.

  • repo (LeanGitRepo) – The repo to which the traced file belongs.

get_direct_dependencies(repo: LeanGitRepo) List[Tuple[str, Path]][source]

Return the names and paths of all modules imported by the current *.lean file.

get_premise_definitions() List[Dict[str, Any]][source]

Return all theorems and definitions defined in the current file that can be potentially used as premises.

Returns:

_description_

Return type:

List[Dict[str, Any]]

get_traced_theorem(thm_or_name: Theorem | str) TracedTheorem | None[source]

Return a TracedTheorem object given an Theorem object or its fully-qualified name.

get_traced_theorems() List[TracedTheorem][source]

Return a list of traced theorem in this traced file.

property has_prelude: bool

Check whether the file starts with :code:prelude.

:code:prelude instructs Lean NOT to include its built-in library automatically.

lean_file: LeanFile

Lean source file of this traced file.

property path: Path

Path of the *.lean file relative to the root directory.

repo: LeanGitRepo

The Lean repo this traced file belongs to.

root_dir: Path

Root directory (in absolute path) of the corresponding traced repo.

to_xml() str[source]

Serialize a TracedFile object to XML.

traced_repo: TracedRepo | None = None

The traced repo this traced file belongs to.

Note that traced_repo will become None after the traced file is serialized/deserialized on its own.

traverse_preorder(callback, node_cls: type | None = None)[source]

Traverse the AST in preorder.

Parameters:
  • callback (function) – Callback function for visiting AST nodes.

  • node_cls (Optional[type], optional) – Restrict the application of callback to only nodes of type node_cls. Defaults to None, which means applying callback to all.

class lean_dojo.data_extraction.traced_data.TracedRepo(repo: LeanGitRepo, dependencies: Dict[str, LeanGitRepo], root_dir: Path, traced_files: List[TracedFile], traced_files_graph: DiGraph | None)[source]

Bases: object

A traced repo is a Lean repo of traced files and additional information, such as other repos it depends on, as well as the dependency graph between files.

check_sanity() None[source]

Perform some basic sanity checks.

The function raises exceptions in case of unsuccessful checks.

dependencies: Dict[str, LeanGitRepo]

Dictionary mapping the name of each dependency to a LeanGitRepo object.

classmethod from_traced_files(root_dir: str | Path, build_deps: bool = True) TracedRepo[source]
Construct a TracedRepo object by parsing *.ast.json and *.path files

produced by lean --ast --tsast --tspp (Lean 3) or ExtractData.lean (Lean 4).

Parameters:
  • root_dir (Union[str, Path]) – Root directory of the traced repo.

  • build_deps (bool, optional) – Whether to build the dependency graph between files.

get_traced_file(path: str | Path) TracedFile[source]

Return a traced file by its path.

get_traced_theorem(thm: Theorem) TracedTheorem | None[source]

Return a TracedTheorem object corresponding to thm

get_traced_theorems() List[TracedTheorem][source]

Return all traced theorems in the repo.

classmethod load_from_disk(root_dir: str | Path, build_deps: bool = True) TracedRepo[source]

Load a traced repo from *.trace.xml files.

property name: str

Name of the repo.

repo: LeanGitRepo

The corresponding Lean repo.

root_dir: Path

Root directory of the traced repo.

save_to_disk() None[source]

Save all traced files in the repo to the disk as *.trace.xml files.

show() None[source]

Show the repo in the default browser.

traced_files: List[TracedFile]

List of traced files in the repo.

traced_files_graph: DiGraph | None

Dependency graph between files in the repo.

The graph is a DAG, and there is an edge from file X to file Y if and only if X imports Y

class lean_dojo.data_extraction.traced_data.TracedTactic(ast: Node, traced_theorem: TracedTheorem | None = None)[source]

Bases: object

A traced tactic is a tactic annotated with additional information including its AST and the states before/after the tactic.

ast: Node

AST of the tactic.

property end: Pos

End position in *.lean file.

get_annotated_tactic() Tuple[str, List[Dict[str, Any]]][source]

Return the tactic annotated with premise information.

Premises in the tactic are marked by <a> ... </a>. For example, rw [add_comm b] contains a premise add_comm and therefore becomes rw [<a>add_comm</a> b]. In addition, the function returns the provenance (full name, file path, line/column numbers) of all premises.

Returns:

The first return value is the tactic string marked by <a> ... </a>. The second return value is a list of provenances.

Return type:

Tuple[str, List[Dict[str, Any]]]

property start: Pos

Start position in *.lean file.

property state_after: str

Pretty-printed state after applying the tactic.

property state_before: str

Pretty-printed state before applying the tactic.

property tactic: str

The raw tactic string.

to_string() str[source]
traced_theorem: TracedTheorem | None = None

The traced theorem this tactic belongs to.

class lean_dojo.data_extraction.traced_data.TracedTheorem(root_dir: Path, theorem: Theorem, ast: CommandTheoremNode | LemmaNode | MathlibTacticLemmaNode, comments: List[Comment], traced_file: TracedFile | None = None)[source]

Bases: object

A traced theorem is a theorem with additional information such as the AST.

ast: CommandTheoremNode | LemmaNode | MathlibTacticLemmaNode

AST of the theorem.

comments: List[Comment]

All comments in the theorem/proof.

property end: Pos

End position in *.lean file.

property file_path: Path

The theorem’s file path (relative to the root directory).

get_num_tactics() int[source]

Return the number of tactics in the proof.

get_premise_full_names() List[str][source]

Return the fully qualified names of all premises used in the proof.

get_proof_node() Node[source]

Return the AST of the theorem’s proof.

get_tactic_proof() str | None[source]

Return the tactic-style proof (if any).

get_theorem_statement() str[source]

Return the theorem statement.

get_traced_tactics(atomic_only: bool = False) List[TracedTactic][source]

Return a list of traced tactics in the proof.

has_tactic_proof() bool[source]

Check if the theorem has a tactic-style proof.

property is_private: bool

Check if the theorem is private.

locate_proof() Tuple[Pos, Pos][source]

Return the start/end positions of the proof.

property repo: LeanGitRepo

The Lean repo this theorem belongs to.

root_dir: Path

Root directory of the corresponding traced repo.

show() None[source]

Show the theorem in the default browser.

property start: Pos

Start position in *.lean file.

theorem: Theorem

The corresponding Theorem object.

traced_file: TracedFile | None = None

The traced file this theorem belongs to.

property traced_repo: TracedRepo | None

The traced repo this theorem belongs to.

lean_dojo.data_extraction.traced_data.get_code_without_comments(lean_file: LeanFile, start: Pos, end: Pos, comments: List[Comment]) str[source]

Return the code in lean_file from start to end with comments removed.

Parameters:
  • lean_file (LeanFile) – The lean source file.

  • start (Pos) – The start position.

  • end (Pos) – The end position.

  • comments (List[Comment]) – A list of Comment objects.

Returns:

Human-written code with comments removed.

Return type:

str