lean_dojo.data_extraction.lean

This module define classes for repos, files, and theorems in Lean. Objects of these classes contain only surface information, without extracting any trace.

lean_dojo.data_extraction.lean.GITHUB_ACCESS_TOKEN = None

GiHub personal access token is optional. If provided, it can increase the rate limit for GitHub API calls.

lean_dojo.data_extraction.lean.LEAN4_NIGHTLY_REPO = None

The GitHub Repo for Lean 4 nightly releases.

lean_dojo.data_extraction.lean.LEAN4_REPO = None

The GitHub Repo for Lean 4 itself.

class lean_dojo.data_extraction.lean.LeanFile(root_dir: Path, path: Path)[source]

Bases: object

A Lean source file (*.lean).

__getitem__(key) str[source]

Return a code segment given its start/end positions.

This enables lean_file[start:end].

Parameters:

key (slice) – A slice of two Pos objects for the start/end of the code segment.

property abs_path: Path

Absolute path of a LeanFile object.

E.g., /home/kaiyu/traced_lean-example/lean-example/src/example.lean

code: List[str]

Raw source code as a list of lines.

convert_pos(byte_idx: int) Pos[source]

Convert a byte index (String.Pos in Lean 4) to a Pos object.

property end_pos: Pos

Return the end position of a source file.

Returns:

A Pos object representing the end of this file.

Return type:

Pos

endwith_newline: bool

Whether the last line ends with a newline.

get_line(line_nb: int) str[source]

Return a given line of the source file.

Parameters:

line_nb (int) – Line number (1-indexed).

is_empty() bool[source]
num_bytes: List[int]

The number of UTF-8 bytes of each line, including newlines.

num_columns(line_nb: int) int[source]

Number of columns in a source file.

property num_lines: int

Number of lines in a source file.

offset(pos: Pos, delta: int) Pos[source]

Off set a position by a given number.

path: Path

Relative path w.r.t. root_dir

E.g., lean-example/src/example.lean

root_dir: Path

Root directory of the traced repo this LeanFile object belongs to.

root_dir must be an absolute path, e.g., /home/kaiyu/traced_lean-example/lean-example

property start_pos: Pos

Return the start position of a source file.

Returns:

A Pos object representing the start of this file.

Return type:

Pos

class lean_dojo.data_extraction.lean.LeanGitRepo(url: str, commit: str)[source]

Bases: object

Git repo of a Lean project.

clone_and_checkout() None[source]

Clone the repo to the current working directory and checkout a specific commit.

commit: str

The repo’s commit hash.

You can also use tags such as v3.5.0. They will be converted to commit hashes.

property commit_url: str
exists() bool[source]
classmethod from_path(path: Path | str) LeanGitRepo[source]

Construct a LeanGitRepo object from the path to a local Git repo.

get_cache_dirname() Path[source]

Return the formatted cache directory name

get_config(filename: str, num_retries: int = 2) Dict[str, Any][source]

Return the repo’s files.

get_dependencies(path: str | Path | None = None) Dict[str, LeanGitRepo][source]

Return the dependencies required by the target repo.

Parameters:

path (Union[str, Path, None], optional) – Root directory of the repo if it is on the disk.

Returns:

A dictionary mapping the name of each dependency to its LeanGitRepo object.

Return type:

Dict[str, LeanGitRepo]

get_license() str | None[source]

Return the content of the LICENSE file.

property is_lean4: bool
lean_version: str

Required Lean version.

property name: str
repo: Repository | Repo

A github.Repository object for GitHub repos or a git.Repo object for local or remote Git repos.

repo_type: RepoType

Type of the repo. It can be GITHUB, LOCAL or REMOTE.

show() None[source]

Show the repo in the default browser.

url: str

The repo’s URL.

It can be a GitHub URL that starts with https:// or git@github.com, a local path, or any other valid Git URL.

uses_lakefile_lean() bool[source]

Check if the repo uses a lakefile.lean.

uses_lakefile_toml() bool[source]

Check if the repo uses a lakefile.toml.

class lean_dojo.data_extraction.lean.Pos(line_nb: int, column_nb: int)[source]

Bases: object

Position in source files.

We use 1-index to keep it consistent with code editors such as Visual Studio Code.

column_nb: int

Column number

classmethod from_str(s: str) Pos[source]

Construct a Pos object from its string representation, e.g., "(323, 1109)".

line_nb: int

Line number

class lean_dojo.data_extraction.lean.RepoInfoCache(tag2commit: ~typing.Dict[~typing.Tuple[str, str], str] = <factory>, lean_version: ~typing.Dict[~typing.Tuple[str, str], str] = <factory>)[source]

Bases: object

To minize the number of network requests, we cache and re-use the info of all repos, assuming it does not change during the execution of LeanDojo.

lean_version: Dict[Tuple[str, str], str]
tag2commit: Dict[Tuple[str, str], str]
class lean_dojo.data_extraction.lean.RepoType(value)[source]

Bases: Enum

An enumeration.

GITHUB = 0
LOCAL = 2
REMOTE = 1
class lean_dojo.data_extraction.lean.Theorem(repo: LeanGitRepo, file_path: Path, full_name: str)[source]

Bases: object

Theorem in Lean.

Theorems are named constants of type Prop. They are typically defined using the keywords theorem or lemma, but it’s possible to use other keywords such as def or instance

file_path: Path

Lean source file the theorem comes from.

full_name: str

Fully qualified name of the theorem.

repo: LeanGitRepo

Lean repo the theorem comes from.

property uhash: str

Unique hash of the theorem.

property uid: str

Unique identifier of the theorem.

lean_dojo.data_extraction.lean.cleanse_string(s: str | Path) str[source]

Replace : and / with _ in a string.

lean_dojo.data_extraction.lean.get_latest_commit(url: str) str[source]

Get the hash of the latest commit of the Git repo at url.

lean_dojo.data_extraction.lean.get_lean4_commit_from_config(config_dict: Dict[str, Any]) str[source]

Return the required Lean commit given a lean-toolchain config.

lean_dojo.data_extraction.lean.get_lean4_version_from_config(toolchain: str) str[source]

Return the required Lean version given a lean-toolchain config.

lean_dojo.data_extraction.lean.get_repo_type(url: str) RepoType | None[source]

Get the type of the repository.

Parameters:

url (str) – The URL of the repository.

Returns:

The type of the repository (None if the repo cannot be found).

Return type:

Optional[str]

lean_dojo.data_extraction.lean.is_commit_hash(s: str)[source]

Check if a string is a valid commit hash.

lean_dojo.data_extraction.lean.is_supported_version(v) bool[source]

Check if v is at least v4.3.0-rc2.

lean_dojo.data_extraction.lean.normalize_url(url: str, repo_type: RepoType = RepoType.GITHUB) str[source]
lean_dojo.data_extraction.lean.ssh_to_https(url: str) str[source]
lean_dojo.data_extraction.lean.url_to_repo(url: str, num_retries: int = 2, repo_type: RepoType | None = None, tmp_dir: Path | None = None) Repo | Repository[source]

Convert a URL to a Repo object.

Parameters:
  • url (str) – The URL of the repository.

  • num_retries (int) – Number of retries in case of failure.

  • repo_type (Optional[RepoType]) – The type of the repository. Defaults to None.

  • tmp_dir (Optional[Path]) – The temporary directory to clone the repo to. Defaults to None.

Returns:

A Git Repo object.

Return type:

Repo