This skill is a boundary/artifact generator. It produces a single, versioned repository snapshot document.
Output artifacts
- Codebook:
docs/artifacts/repo_codebook.md - Persistent config (state):
docs/artifacts/repo_codebook.config.json - Rationale: these are generated documentation artifacts, not production source.
Non-negotiables
- Do NOT include secrets or env files (e.g.,
.env,.env.*). - Exclude generated/build/runtime artifacts and other non-source noise.
- File descriptions must be 1 line max, objective, and accurate.
- Skip empty/whitespace-only files in the code section.
- If you add comments to code while generating/fixing: ONLY essential comments, in ENGLISH.
Persistent config (stateful)
The generator maintains a stateful config file so users can add more ignores without editing .gitignore or the skill code.
- Path:
docs/artifacts/repo_codebook.config.json - Behavior: created automatically on first run if missing (bootstrapped from the skill template when available).
Config fields
version: config schema version (integer).codebook_version: last generated codebook version (semver string, e.g.,1.0.7). Used to persist versioning even ifrepo_codebook.mdis deleted.ignore_globs_extra: additional glob patterns to exclude (e.g.,data/**,*.pdf).skip_empty_files: if true, empty/whitespace-only files are omitted from the code section.max_text_file_bytes: maximum size for text files to include (bytes).
How to run (recommended)
Run from the repository root you want to document:
uv run python ~/.codex/skills/repo-codebook-generator/scripts/generate_repo_codebook.py --repo-root "$PWD"
Non-interactive mode (CI / automation)
To skip prompts and generate immediately:
uv run python ~/.codex/skills/repo-codebook-generator/scripts/generate_repo_codebook.py --repo-root "$PWD" --non-interactive
Manage ignores (recommended)
Add patterns:
uv run python ~/.codex/skills/repo-codebook-generator/scripts/generate_repo_codebook.py --repo-root "$PWD" --add-ignore "data/**" --add-ignore "*.pdf"
Remove patterns:
uv run python ~/.codex/skills/repo-codebook-generator/scripts/generate_repo_codebook.py --repo-root "$PWD" --remove-ignore "*.pdf"
Update config only (no generation):
uv run python ~/.codex/skills/repo-codebook-generator/scripts/generate_repo_codebook.py --repo-root "$PWD" --config-only --add-ignore "out/**"
What counts as "generated/build/runtime artifacts"
Common examples: .env, .venv, __pycache__, .mypy_cache, .ruff_cache, .pytest_cache, *.egg-info, .coverage, htmlcov, lockfiles, etc.
Steps (must follow in order)
1) Ensure output directory exists
Create:
docs/artifacts/
2) Load persistent config
Read (or create) docs/artifacts/repo_codebook.config.json and apply:
- built-in excludes +
ignore_globs_extra - file-size threshold (
max_text_file_bytes) - empty-file behavior (
skip_empty_files) - persisted
codebook_version(for version continuity)
3) Interactive preflight (before generation)
By default (when running in a TTY), the generator runs an interactive preflight before writing docs/artifacts/repo_codebook.md:
-
Print an "Ignore Summary" showing:
- Layer 1:
.gitignore/ git excludes behavior - Layer 2: built-in excludes (components + globs)
- Layer 3: current
ignore_globs_extrafrom config
- Layer 1:
-
Prompt with enumerated choices:
- Add more files/folders/patterns to ignore (persistent)
- Continue without changes
-
If the user chooses 1, accept multiple entries (one per line; empty line ends).
- Directories are canonicalized and persisted as
dir/**(ignore the directory and all descendants) - Globs like
*.pdfare kept as-is
- Directories are canonicalized and persisted as
-
After saving, print what was added and show the full
ignore_globs_extralist. -
Prompt again:
- Generate
repo_codebook.mdnow - Add more ignores (loops back)
- Generate
Note:
- Use
--non-interactiveto disable prompts (CI / automation).
4) Generate project structure (tree) respecting .gitignore
Preferred: use tree --gitignore plus extra excludes via -I.
Recommended command (no colors, include dot dirs, directories first):
bash skills/repo-codebook-generator/scripts/get_tree.sh
Notes:
--gitignoreensures.gitignorerules are applied.- Extra ignores from config are applied best-effort (converted to a
tree -Iexpression viaIGNORE_PATTERN_EXTRA). - If
treeis not installed, the generator falls back to afind-based listing (best-effort).
5) Build the file list to document (matching tree semantics)
Use Git as the source of truth for "not ignored":
git ls-files -co --exclude-standard
Then apply:
- built-in excludes
- persistent
ignore_globs_extrafrom config
Directory semantics:
- Directory-like ignore entries are expanded to exclude both the directory itself and all descendants (e.g.,
data->dataanddata/**) so directory pruning works correctly.
6) Write / update docs/artifacts/repo_codebook.md
The document must contain:
## Project Info
- name: <short representative name>
- description:
- <bullet 1>
- <bullet 2>
- codebook_version: <semver>
## Project Structure
```bash
<tree output>
```
### Descriptions
- <path>: <one-line objective description>
...
## Project Current Code
```<path>
<full file contents>
```
...
7) Versioning rule for codebook_version
- If the file is created for the first time:
1.0.0 - If it already exists: bump PATCH by default (e.g.,
1.0.0->1.0.1) - If
repo_codebook.mdis missing but config containscodebook_version, bump PATCH from config to preserve continuity - Only bump MINOR/MAJOR if explicitly requested.
- After successful generation, persist the new
codebook_versionintodocs/artifacts/repo_codebook.config.json.
8) Size / binary / empty-file safety
- Skip binary files and very large files (default threshold: 512 KB or
max_text_file_bytesfrom config) and add a note like:- <path>: skipped (binary or too large)
- Empty/whitespace-only files:
- Descriptions show
skipped (empty file) - Code blocks are omitted (when
skip_empty_files=true)
- Descriptions show
How to run (recommended)
Generate the codebook:
uv run python ~/.codex/skills/repo-codebook-generator/scripts/generate_repo_codebook.py --repo-root "$PWD"
This will:
- Ensure
docs/artifacts/exists - Ensure
docs/artifacts/repo_codebook.config.jsonexists (create if missing, bootstrapped from template when possible) - Run an interactive ignore preflight (unless
--non-interactiveis used) - Produce
treeoutput using.gitignore+ built-in excludes + config excludes (best-effort) - Generate/update the codebook with bumped patch version (persisted in config for continuity)
- Include one-line per file + full code blocks (excluding empty/binary/too-large)
