Large-Batch Performance#

Pain001 now ships with a concrete large-batch benchmark workflow and a streaming generation mode for chunked processing.

Recommended approach for large input files:

Use --streaming with an explicit --chunk-size to keep memory bounded.
Benchmark your actual data shape before increasing chunk size.
Prefer chunk sizes in the 500 to 5000 range unless profiling shows otherwise.

Example:

pain001 -t pain.001.001.03 -m template.xml -s schema.xsd -d payments.csv --streaming --chunk-size 1000
poetry run python scripts/benchmark_large_batches.py

What to measure:

Total rows processed
Wall-clock generation time
Per-chunk generation time
Number of XML files emitted in streaming mode