lean_dojo.data_extraction.trace
This module provides the main interfaces for tracing Lean repos, i.e., extracting data from them.
To estimate the time for tracing a repo, a good rule of thumb is 1.5x the time for compiling the repo using leanpkg build.
A repo has to be traced only once, and the traced repo will be stored in a cache for fast access in the future.
- lean_dojo.data_extraction.trace.check_files(packages_path: Path, no_deps: bool) None[source]
Check if all *.lean files have been processed to produce *.ast.json and *.dep_paths files.
- lean_dojo.data_extraction.trace.get_traced_repo_path(repo: LeanGitRepo, build_deps: bool = True) Path[source]
Return the path of a traced repo in the cache.
The function will trace a repo if it is not available in the cache. See Caching for details.
- Parameters:
repo (LeanGitRepo) – The Lean repo to trace.
build_deps (bool) – Whether to build the dependencies of
repo. Defaults to True.
- Returns:
The path of the traced repo in the cache, e.g.
/home/kaiyu/.cache/lean_dojo/leanprover-community-mathlib-2196ab363eb097c008d4497125e0dde23fb36db2- Return type:
Path
- lean_dojo.data_extraction.trace.is_available_in_cache(repo: LeanGitRepo) bool[source]
Check if
repohas a traced repo available in the cache (including the remote cache).
- lean_dojo.data_extraction.trace.is_new_version(v: str) bool[source]
Check if
vis at least 4.3.0-rc2.
- lean_dojo.data_extraction.trace.launch_progressbar(paths: List[Path]) Generator[None, None, None][source]
Launch an async progressbar to monitor the progress of tracing the repo.
- lean_dojo.data_extraction.trace.trace(repo: LeanGitRepo, dst_dir: str | Path | None = None, build_deps: bool = True) TracedRepo[source]
Trace a repo (and its dependencies), saving the results to
dst_dir.The function only traces the repo when it’s not available in the cache. Otherwise, it directly copies the traced repo from the cache to
dst_dir. See Caching for details.- Parameters:
repo (LeanGitRepo) – The Lean repo to trace.
dst_dir (Union[str, Path]) – The directory for saving the traced repo. If None, the traced repo is only saved in the cahe.
build_deps (bool) – Whether to build the dependencies of
repo. Defaults to True.
- Returns:
A
TracedRepoobject corresponding to the files atdst_dir.- Return type: