lean_dojo.data_extraction.trace

This module provides the main interfaces for tracing Lean repos, i.e., extracting data from them. To estimate the time for tracing a repo, a good rule of thumb is 1.5x the time for compiling the repo using leanpkg build. A repo has to be traced only once, and the traced repo will be stored in a cache for fast access in the future.

lean_dojo.data_extraction.trace.get_traced_repo_path(repo: LeanGitRepo, build_deps: bool = True) Path[source]

Return the path of a traced repo in the cache.

The function will trace a repo if it is not available in the cache. See Caching for details.

Parameters:
  • repo (LeanGitRepo) – The Lean repo to trace.

  • build_deps (bool) – Whether to build the dependencies of repo. Defaults to True.

Returns:

The path of the traced repo in the cache, e.g. /home/kaiyu/.cache/lean_dojo/leanprover-community-mathlib-2196ab363eb097c008d4497125e0dde23fb36db2

Return type:

Path

lean_dojo.data_extraction.trace.is_available_in_cache(repo: LeanGitRepo) bool[source]

Check if repo has a traced repo available in the cache (including the remote cache).

lean_dojo.data_extraction.trace.trace(repo: LeanGitRepo, dst_dir: str | Path | None = None, build_deps: bool = True) TracedRepo[source]

Trace a repo (and its dependencies), saving the results to dst_dir.

The function only traces the repo when it’s not available in the cache. Otherwise, it directly copies the traced repo from the cache to dst_dir. See Caching for details.

Parameters:
  • repo (LeanGitRepo) – The Lean repo to trace.

  • dst_dir (Union[str, Path]) – The directory for saving the traced repo. If None, the traced repo is only saved in the cahe.

  • build_deps (bool) – Whether to build the dependencies of repo. Defaults to True.

Returns:

A TracedRepo object corresponding to the files at dst_dir.

Return type:

TracedRepo