lean_dojo.data_extraction.lean
This module define classes for repos, files, and theorems in Lean. Objects of these classes contain only surface information, without extracting any trace.
- lean_dojo.data_extraction.lean.GITHUB_ACCESS_TOKEN = None
GiHub personal access token is optional. If provided, it can increase the rate limit for GitHub API calls.
- lean_dojo.data_extraction.lean.LEAN4_NIGHTLY_REPO = None
The GitHub Repo for Lean 4 nightly releases.
- lean_dojo.data_extraction.lean.LEAN4_REPO = None
The GitHub Repo for Lean 4 itself.
- class lean_dojo.data_extraction.lean.LeanFile(root_dir: Path, path: Path)[source]
Bases:
objectA Lean source file (
*.lean).- __getitem__(key) str[source]
Return a code segment given its start/end positions.
This enables
lean_file[start:end].- Parameters:
key (slice) – A slice of two
Posobjects for the start/end of the code segment.
- property abs_path: Path
Absolute path of a
LeanFileobject.E.g.,
/home/kaiyu/traced_lean-example/lean-example/src/example.lean
- code: List[str]
Raw source code as a list of lines.
- convert_pos(byte_idx: int) Pos[source]
Convert a byte index (
String.Posin Lean 4) to aPosobject.
- endwith_newline: bool
Whether the last line ends with a newline.
- get_line(line_nb: int) str[source]
Return a given line of the source file.
- Parameters:
line_nb (int) – Line number (1-indexed).
- num_bytes: List[int]
The number of UTF-8 bytes of each line, including newlines.
- property num_lines: int
Number of lines in a source file.
- path: Path
Relative path w.r.t.
root_dirE.g.,
lean-example/src/example.lean
- class lean_dojo.data_extraction.lean.LeanGitRepo(url: str, commit: str)[source]
Bases:
objectGit repo of a Lean project.
- clone_and_checkout() None[source]
Clone the repo to the current working directory and checkout a specific commit.
- commit: str
The repo’s commit hash.
You can also use tags such as
v3.5.0. They will be converted to commit hashes.
- property commit_url: str
- classmethod from_path(path: Path | str) LeanGitRepo[source]
Construct a
LeanGitRepoobject from the path to a local Git repo.
- get_dependencies(path: str | Path | None = None) Dict[str, LeanGitRepo][source]
Return the dependencies required by the target repo.
- Parameters:
path (Union[str, Path, None], optional) – Root directory of the repo if it is on the disk.
- Returns:
A dictionary mapping the name of each dependency to its
LeanGitRepoobject.- Return type:
Dict[str,
LeanGitRepo]
- property is_lean4: bool
- lean_version: str
Required Lean version.
- property name: str
- repo: Repository | Repo
A
github.Repositoryobject for GitHub repos or agit.Repoobject for local or remote Git repos.
- url: str
The repo’s URL.
It can be a GitHub URL that starts with https:// or git@github.com, a local path, or any other valid Git URL.
- class lean_dojo.data_extraction.lean.Pos(line_nb: int, column_nb: int)[source]
Bases:
objectPosition in source files.
We use 1-index to keep it consistent with code editors such as Visual Studio Code.
- column_nb: int
Column number
- classmethod from_str(s: str) Pos[source]
Construct a
Posobject from its string representation, e.g.,"(323, 1109)".
- line_nb: int
Line number
- class lean_dojo.data_extraction.lean.RepoInfoCache(tag2commit: ~typing.Dict[~typing.Tuple[str, str], str] = <factory>, lean_version: ~typing.Dict[~typing.Tuple[str, str], str] = <factory>)[source]
Bases:
objectTo minize the number of network requests, we cache and re-use the info of all repos, assuming it does not change during the execution of LeanDojo.
- lean_version: Dict[Tuple[str, str], str]
- tag2commit: Dict[Tuple[str, str], str]
- class lean_dojo.data_extraction.lean.RepoType(value)[source]
Bases:
EnumAn enumeration.
- GITHUB = 0
- LOCAL = 2
- REMOTE = 1
- class lean_dojo.data_extraction.lean.Theorem(repo: LeanGitRepo, file_path: Path, full_name: str)[source]
Bases:
objectTheorem in Lean.
Theorems are named constants of type
Prop. They are typically defined using the keywordstheoremorlemma, but it’s possible to use other keywords such asdeforinstance- file_path: Path
Lean source file the theorem comes from.
- full_name: str
Fully qualified name of the theorem.
- repo: LeanGitRepo
Lean repo the theorem comes from.
- property uhash: str
Unique hash of the theorem.
- property uid: str
Unique identifier of the theorem.
- lean_dojo.data_extraction.lean.cleanse_string(s: str | Path) str[source]
Replace : and / with _ in a string.
- lean_dojo.data_extraction.lean.get_latest_commit(url: str) str[source]
Get the hash of the latest commit of the Git repo at
url.
- lean_dojo.data_extraction.lean.get_lean4_commit_from_config(config_dict: Dict[str, Any]) str[source]
Return the required Lean commit given a
lean-toolchainconfig.
- lean_dojo.data_extraction.lean.get_lean4_version_from_config(toolchain: str) str[source]
Return the required Lean version given a
lean-toolchainconfig.
- lean_dojo.data_extraction.lean.get_repo_type(url: str) RepoType | None[source]
Get the type of the repository.
- Parameters:
url (str) – The URL of the repository.
- Returns:
The type of the repository (None if the repo cannot be found).
- Return type:
Optional[str]
- lean_dojo.data_extraction.lean.is_commit_hash(s: str)[source]
Check if a string is a valid commit hash.
- lean_dojo.data_extraction.lean.is_supported_version(v) bool[source]
Check if
vis at least v4.3.0-rc2.
- lean_dojo.data_extraction.lean.normalize_url(url: str, repo_type: RepoType = RepoType.GITHUB) str[source]
- lean_dojo.data_extraction.lean.url_to_repo(url: str, num_retries: int = 2, repo_type: RepoType | None = None, tmp_dir: Path | None = None) Repo | Repository[source]
Convert a URL to a Repo object.
- Parameters:
url (str) – The URL of the repository.
num_retries (int) – Number of retries in case of failure.
repo_type (Optional[RepoType]) – The type of the repository. Defaults to None.
tmp_dir (Optional[Path]) – The temporary directory to clone the repo to. Defaults to None.
- Returns:
A Git Repo object.
- Return type:
Repo