Performance Playbook#
Large File Guidance#
Prefer
--streamingfor high-row-count CSV, JSONL, and parquet inputs.Start with
--chunk-size 500for low-memory runners and increase to1000or2000on developer machines.Reuse bundled templates through the registry instead of repeatedly resolving custom paths.
Use
scripts/benchmark_large_batches.pyto measure the current branch on representative data before changing chunk sizes.
CPU And Memory Trade-Offs#
Smaller chunks reduce peak memory but increase file count.
Larger chunks reduce file count but increase render and validation time per output file.
Validation dominates for complex schemas, so benchmark end-to-end rather than just template rendering.