lean_dojo.data_extraction.traced_data

This module defines traced repos/files/theorems.

class lean_dojo.data_extraction.traced_data.Comment(start: Pos, end: Pos, text: str)[source]

Bases: object

A comment in a Lean file.

end: Pos
classmethod from_xml(tree: Element) Comment[source]
start: Pos
text: str
to_xml(parent: Element) None[source]
class lean_dojo.data_extraction.traced_data.TracedFile(root_dir: Path, repo: LeanGitRepo, lean_file: LeanFile, ast: FileNode, comments: List[Comment], traced_repo: TracedRepo | None = None)[source]

Bases: object

A traced file is a Lean source file annotated with syntactic/semantic information such as tactic states, Lean expressions, and abstract syntax trees (ASTs).

property abs_path: Path

Absolute path of the *.lean file.

ast: FileNode

Abstract syntax tree (AST) of the entire *.lean file.

AST nodes are defined in lean_dojo.data_extraction.ast.

check_sanity() None[source]

Perform some basic sanity checks.

The function raises exceptions in case of unsuccessful checks.

comments: List[Comment]

All comments in the *.lean file.

classmethod from_traced_file(root_dir: str | Path, json_path: Path, repo: LeanGitRepo) TracedFile[source]

Construct a TracedFile object by parsing a *.ast.json file produced by lean --ast --tsast --tspp (Lean 3) or ExtractData.lean (Lean 4).

Parameters:
  • root_dir (Union[str, Path]) – Root directory of the traced repo.

  • json_path (Path) – Path of the *.ast.json file relative to root_dir.

classmethod from_xml(root_dir: str | Path, path: str | Path, repo: LeanGitRepo) TracedFile[source]

Load a TracedFile object from its *.trace.xml file.

Parameters:
  • root_dir (Union[str, Path]) – Root directory of the traced repo.

  • path (Union[str, Path]) – Path of the *.trace.xml file relative to root_dir.

  • repo (LeanGitRepo) – The repo to which the traced file belongs.

get_direct_dependencies(repo: LeanGitRepo) List[Tuple[str, Path]][source]

Return the names and paths of all modules imported by the current *.lean file.

get_premise_definitions() List[Dict[str, Any]][source]

Return all theorems and definitions defined in the current file that can be potentially used as premises, including the premises in the theorem statement and premises in the tactics used to prove the theorem.

Returns:

_description_

Return type:

List[Dict[str, Any]]

get_traced_theorem(thm_or_name: Theorem | str) TracedTheorem | None[source]

Return a TracedTheorem object given an Theorem object or its fully-qualified name.

get_traced_theorems() List[TracedTheorem][source]

Return a list of traced theorem in this traced file.

property has_prelude: bool

Check whether the file starts with :code:prelude.

:code:prelude instructs Lean NOT to include its built-in library automatically.

lean_file: LeanFile

Lean source file of this traced file.

property path: Path

Path of the *.lean file relative to the root directory.

repo: LeanGitRepo

The Lean repo this traced file belongs to.

root_dir: Path

Root directory (in absolute path) of the corresponding traced repo.

to_xml() str[source]

Serialize a TracedFile object to XML.

traced_repo: TracedRepo | None = None

The traced repo this traced file belongs to.

Note that traced_repo will become None after the traced file is serialized/deserialized on its own.

traverse_preorder(callback, node_cls: type | None = None)[source]

Traverse the AST in preorder.

Parameters:
  • callback (function) – Callback function for visiting AST nodes.

  • node_cls (Optional[type], optional) – Restrict the application of callback to only nodes of type node_cls. Defaults to None, which means applying callback to all.

class lean_dojo.data_extraction.traced_data.TracedRepo(repo: LeanGitRepo, dependencies: Dict[str, LeanGitRepo], root_dir: Path, traced_files: List[TracedFile], traced_files_graph: DiGraph | None)[source]

Bases: object

A traced repo is a Lean repo of traced files and additional information, such as other repos it depends on, as well as the dependency graph between files.

check_sanity() None[source]

Perform some basic sanity checks.

The function raises exceptions in case of unsuccessful checks.

dependencies: Dict[str, LeanGitRepo]

Dictionary mapping the name of each dependency to a LeanGitRepo object.

classmethod from_traced_files(root_dir: str | Path, build_deps: bool = True) TracedRepo[source]
Construct a TracedRepo object by parsing *.ast.json and *.path files

produced by lean --ast --tsast --tspp (Lean 3) or ExtractData.lean (Lean 4).

Parameters:
  • root_dir (Union[str, Path]) – Root directory of the traced repo.

  • build_deps (bool, optional) – Whether to build the dependency graph between files.

get_traced_file(path: str | Path) TracedFile[source]

Return a traced file by its path.

get_traced_theorem(thm: Theorem) TracedTheorem | None[source]

Return a TracedTheorem object corresponding to thm

get_traced_theorems() List[TracedTheorem][source]

Return all traced theorems in the repo.

classmethod load_from_disk(root_dir: str | Path, build_deps: bool = True) TracedRepo[source]

Load a traced repo from *.trace.xml files.

property name: str

Name of the repo.

repo: LeanGitRepo

The corresponding Lean repo.

root_dir: Path

Root directory of the traced repo.

save_to_disk() None[source]

Save all traced files in the repo to the disk as *.trace.xml files.

show() None[source]

Show the repo in the default browser.

traced_files: List[TracedFile]

List of traced files in the repo.

traced_files_graph: DiGraph | None

Dependency graph between files in the repo.

The graph is a DAG, and there is an edge from file X to file Y if and only if X imports Y

class lean_dojo.data_extraction.traced_data.TracedTactic(ast: Node, traced_theorem: TracedTheorem | None = None)[source]

Bases: object

A traced tactic is a tactic annotated with additional information including its AST and the states before/after the tactic.

ast: Node

AST of the tactic.

property end: Pos

End position in *.lean file.

get_annotated_tactic() Tuple[str, List[Dict[str, Any]]][source]

Return the tactic annotated with premise information.

Premises in the tactic are marked by <a> ... </a>. For example, rw [add_comm b] contains a premise add_comm and therefore becomes rw [<a>add_comm</a> b]. In addition, the function returns the provenance (full name, file path, line/column numbers) of all premises.

Returns:

The first return value is the tactic string marked by <a> ... </a>. The second return value is a list of provenances.

Return type:

Tuple[str, List[Dict[str, Any]]]

property start: Pos

Start position in *.lean file.

property state_after: str

Pretty-printed state after applying the tactic.

property state_before: str

Pretty-printed state before applying the tactic.

property tactic: str

The raw tactic string.

to_string() str[source]
traced_theorem: TracedTheorem | None = None

The traced theorem this tactic belongs to.

class lean_dojo.data_extraction.traced_data.TracedTheorem(root_dir: Path, theorem: Theorem, ast: CommandTheoremNode | LemmaNode | MathlibTacticLemmaNode, comments: List[Comment], traced_file: TracedFile | None = None)[source]

Bases: object

A traced theorem is a theorem with additional information such as the AST.

ast: CommandTheoremNode | LemmaNode | MathlibTacticLemmaNode

AST of the theorem.

comments: List[Comment]

All comments in the theorem/proof.

property end: Pos

End position in *.lean file.

property file_path: Path

The theorem’s file path (relative to the root directory).

get_num_tactics() int[source]

Return the number of tactics in the proof.

get_premise_full_names() List[str][source]

Return the fully qualified names of all premises used in the proof.

get_proof_node() Node[source]

Return the AST of the theorem’s proof.

get_tactic_proof() str | None[source]

Return the tactic-style proof (if any).

get_theorem_statement() str[source]

Return the theorem statement.

get_traced_tactics(atomic_only: bool = False) List[TracedTactic][source]

Return a list of traced tactics in the proof.

has_tactic_proof() bool[source]

Check if the theorem has a tactic-style proof.

property is_private: bool

Check if the theorem is private.

locate_proof() Tuple[Pos, Pos][source]

Return the start/end positions of the proof.

property repo: LeanGitRepo

The Lean repo this theorem belongs to.

root_dir: Path

Root directory of the corresponding traced repo.

show() None[source]

Show the theorem in the default browser.

property start: Pos

Start position in *.lean file.

theorem: Theorem

The corresponding Theorem object.

traced_file: TracedFile | None = None

The traced file this theorem belongs to.

property traced_repo: TracedRepo | None

The traced repo this theorem belongs to.

lean_dojo.data_extraction.traced_data.get_code_without_comments(lean_file: LeanFile, start: Pos, end: Pos, comments: List[Comment]) str[source]

Return the code in lean_file from start to end with comments removed.

Parameters:
  • lean_file (LeanFile) – The lean source file.

  • start (Pos) – The start position.

  • end (Pos) – The end position.

  • comments (List[Comment]) – A list of Comment objects.

Returns:

Human-written code with comments removed.

Return type:

str