Performance Playbook#

Large File Guidance#

Prefer --streaming for high-row-count CSV, JSONL, and parquet inputs.
Start with --chunk-size 500 for low-memory runners and increase to 1000 or 2000 on developer machines.
Reuse bundled templates through the registry instead of repeatedly resolving custom paths.
Use scripts/benchmark_large_batches.py to measure the current branch on representative data before changing chunk sizes.

Smaller chunks reduce peak memory but increase file count.
Larger chunks reduce file count but increase render and validation time per output file.
Validation dominates for complex schemas, so benchmark end-to-end rather than just template rendering.