pain001.data package#

Submodules#

pain001.data.loader module#

Universal data loader supporting multiple input sources.

pain001.data.loader.load_payment_data(data_source: str | list[dict[str, Any]] | dict[str, Any]) → list[dict[str, Any]][source]#

Universal data loader supporting multiple input sources.

This function provides a unified interface for loading payment data from various sources while maintaining backward compatibility with existing file-based workflows.

Parameters:

data_source –

The payment data source. Supports: - str: File path to CSV (.csv), SQLite (.db), JSON (.json/.jsonl),

or Parquet (.parquet) file

list: List of dictionaries with payment data
dict: Single payment transaction as dictionary

Returns:

List of payment data dictionaries

Return type:

List[Dict[str, Any]]

Raises:

DataSourceError – If the data source type is unsupported. Errors from the underlying loaders (e.g. FileNotFoundError for missing files, PaymentValidationError for invalid rows) propagate unchanged.

Examples

# Existing file-based usage (backward compatible) >>> data = load_payment_data(‘payments.csv’) >>> data = load_payment_data(‘payments.db’)

# New JSON formats >>> data = load_payment_data(‘payments.json’) >>> data = load_payment_data(‘payments.jsonl’) # JSON Lines

# New Parquet format (requires pyarrow) >>> data = load_payment_data(‘payments.parquet’)

# New direct Python data usage >>> data = load_payment_data([ … {‘id’: ‘MSG001’, ‘amount’: ‘1000.00’, …}, … {‘id’: ‘MSG002’, ‘amount’: ‘500.00’, …} … ])

# Single transaction >>> data = load_payment_data({ … ‘id’: ‘MSG001’, ‘amount’: ‘1000.00’, … … })

pain001.data.loader.load_payment_data_streaming(data_source: str | list[dict[str, Any]], chunk_size: int = 1000, validate: bool = True) → Generator[list[dict[str, Any]], None, None][source]#

Memory-efficient streaming loader supporting multiple input sources.

This function yields chunks of payment data instead of loading everything into memory, making it suitable for large datasets (millions of rows).

Parameters:

data_source – The payment data source. Supports: - str: File path to CSV (.csv) or SQLite (.db) file - list: List of dictionaries with payment data
chunk_size – Number of records to yield per chunk. Default is 1000.
validate – If True, validate each chunk. Default True. Set False for testing or when data is pre-validated.

Yields:

list[dict[str, Any]] – Chunks of payment data dictionaries

Raises:

DataSourceError – If the data source type is unsupported or a chunk fails validation. Errors from the underlying streaming loaders (e.g. FileNotFoundError for missing files) propagate unchanged.

Examples

# Streaming from large CSV file >>> for chunk in load_payment_data_streaming(‘large_payments.csv’, chunk_size=500): … process_batch(chunk)

# Streaming from large SQLite database >>> for chunk in load_payment_data_streaming(‘payments.db’, chunk_size=1000): … generate_xml_batch(chunk)

# Streaming from large Python list (useful for APIs) >>> large_data = [{‘id’: f’TX{i}’, …} for i in range(100000)] >>> for chunk in load_payment_data_streaming(large_data, chunk_size=500): … validate_and_process(chunk)

Performance:

Memory usage: O(chunk_size) instead of O(total_records)
Enables processing datasets larger than available RAM
~10-15% slower than load_payment_data() due to yielding overhead
Best for files/datasets with 10,000+ records

Note

Single dict input not supported in streaming mode. Convert to list first.

Module contents#

Data loading and validation module.

pain001.data.load_payment_data(data_source: str | list[dict[str, Any]] | dict[str, Any]) → list[dict[str, Any]][source]#

Universal data loader supporting multiple input sources.

This function provides a unified interface for loading payment data from various sources while maintaining backward compatibility with existing file-based workflows.

Parameters:

data_source –

The payment data source. Supports: - str: File path to CSV (.csv), SQLite (.db), JSON (.json/.jsonl),

or Parquet (.parquet) file

list: List of dictionaries with payment data
dict: Single payment transaction as dictionary

Returns:

List of payment data dictionaries

Return type:

List[Dict[str, Any]]

Raises:

DataSourceError – If the data source type is unsupported. Errors from the underlying loaders (e.g. FileNotFoundError for missing files, PaymentValidationError for invalid rows) propagate unchanged.

Examples

# Existing file-based usage (backward compatible) >>> data = load_payment_data(‘payments.csv’) >>> data = load_payment_data(‘payments.db’)

# New JSON formats >>> data = load_payment_data(‘payments.json’) >>> data = load_payment_data(‘payments.jsonl’) # JSON Lines

# New Parquet format (requires pyarrow) >>> data = load_payment_data(‘payments.parquet’)

# New direct Python data usage >>> data = load_payment_data([ … {‘id’: ‘MSG001’, ‘amount’: ‘1000.00’, …}, … {‘id’: ‘MSG002’, ‘amount’: ‘500.00’, …} … ])

# Single transaction >>> data = load_payment_data({ … ‘id’: ‘MSG001’, ‘amount’: ‘1000.00’, … … })