lean_dojo.data_extraction.trace
This module provides the main interfaces for tracing Lean repos, i.e., extracting data from them.
To estimate the time for tracing a repo, a good rule of thumb is 1.5x the time for compiling the repo using leanpkg build
.
A repo has to be traced only once, and the traced repo will be stored in a cache for fast access in the future.
- lean_dojo.data_extraction.trace.check_files(packages_path: Path, no_deps: bool) None [source]
Check if all *.lean files have been processed to produce *.ast.json and *.dep_paths files.
- lean_dojo.data_extraction.trace.get_traced_repo_path(repo: LeanGitRepo, build_deps: bool = True) Path [source]
Return the path of a traced repo in the cache.
The function will trace a repo if it is not available in the cache. See Caching for details.
- Parameters:
repo (LeanGitRepo) – The Lean repo to trace.
build_deps (bool) – Whether to build the dependencies of
repo
. Defaults to True.
- Returns:
The path of the traced repo in the cache, e.g.
/home/kaiyu/.cache/lean_dojo/leanprover-community-mathlib-2196ab363eb097c008d4497125e0dde23fb36db2
- Return type:
Path
- lean_dojo.data_extraction.trace.is_available_in_cache(repo: LeanGitRepo) bool [source]
Check if
repo
has a traced repo available in the cache (including the remote cache).
- lean_dojo.data_extraction.trace.is_new_version(v: str) bool [source]
Check if
v
is at least 4.3.0-rc2.
- lean_dojo.data_extraction.trace.launch_progressbar(paths: List[Path]) Generator[None, None, None] [source]
Launch an async progressbar to monitor the progress of tracing the repo.
- lean_dojo.data_extraction.trace.trace(repo: LeanGitRepo, dst_dir: str | Path | None = None, build_deps: bool = True) TracedRepo [source]
Trace a repo (and its dependencies), saving the results to
dst_dir
.The function only traces the repo when it’s not available in the cache. Otherwise, it directly copies the traced repo from the cache to
dst_dir
. See Caching for details.- Parameters:
repo (LeanGitRepo) – The Lean repo to trace.
dst_dir (Union[str, Path]) – The directory for saving the traced repo. If None, the traced repo is only saved in the cahe.
build_deps (bool) – Whether to build the dependencies of
repo
. Defaults to True.
- Returns:
A
TracedRepo
object corresponding to the files atdst_dir
.- Return type: