pain001.data package#
Submodules#
pain001.data.loader module#
Universal data loader supporting multiple input sources.
- pain001.data.loader.load_payment_data(data_source: str | list[dict[str, Any]] | dict[str, Any]) list[dict[str, Any]][source]#
Universal data loader supporting multiple input sources.
This function provides a unified interface for loading payment data from various sources while maintaining backward compatibility with existing file-based workflows.
- Parameters:
data_source –
The payment data source. Supports: - str: File path to CSV (.csv), SQLite (.db), JSON (.json/.jsonl),
or Parquet (.parquet) file
list: List of dictionaries with payment data
dict: Single payment transaction as dictionary
- Returns:
List of payment data dictionaries
- Return type:
List[Dict[str, Any]]
- Raises:
ValueError – If data source type is unsupported or data is invalid
FileNotFoundError – If file path doesn’t exist
Examples
# Existing file-based usage (backward compatible) >>> data = load_payment_data(‘payments.csv’) >>> data = load_payment_data(‘payments.db’)
# New JSON formats >>> data = load_payment_data(‘payments.json’) >>> data = load_payment_data(‘payments.jsonl’) # JSON Lines
# New Parquet format (requires pyarrow) >>> data = load_payment_data(‘payments.parquet’)
# New direct Python data usage >>> data = load_payment_data([ … {‘id’: ‘MSG001’, ‘amount’: ‘1000.00’, …}, … {‘id’: ‘MSG002’, ‘amount’: ‘500.00’, …} … ])
# Single transaction >>> data = load_payment_data({ … ‘id’: ‘MSG001’, ‘amount’: ‘1000.00’, … … })
- pain001.data.loader.load_payment_data_streaming(data_source: str | list[dict[str, Any]], chunk_size: int = 1000, validate: bool = True) Generator[list[dict[str, Any]], None, None][source]#
Memory-efficient streaming loader supporting multiple input sources.
This function yields chunks of payment data instead of loading everything into memory, making it suitable for large datasets (millions of rows).
- Parameters:
data_source – The payment data source. Supports: - str: File path to CSV (.csv) or SQLite (.db) file - list: List of dictionaries with payment data
chunk_size – Number of records to yield per chunk. Default is 1000.
validate – If True, validate each chunk. Default True. Set False for testing or when data is pre-validated.
- Yields:
List[Dict[str, Any]] – Chunks of payment data dictionaries
- Raises:
ValueError – If data source type is unsupported or data is invalid
FileNotFoundError – If file path doesn’t exist
DataSourceError – If data source is empty or invalid
Examples
# Streaming from large CSV file >>> for chunk in load_payment_data_streaming(‘large_payments.csv’, chunk_size=500): … process_batch(chunk)
# Streaming from large SQLite database >>> for chunk in load_payment_data_streaming(‘payments.db’, chunk_size=1000): … generate_xml_batch(chunk)
# Streaming from large Python list (useful for APIs) >>> large_data = [{‘id’: f’TX{i}’, …} for i in range(100000)] >>> for chunk in load_payment_data_streaming(large_data, chunk_size=500): … validate_and_process(chunk)
- Performance:
Memory usage: O(chunk_size) instead of O(total_records)
Enables processing datasets larger than available RAM
~10-15% slower than load_payment_data() due to yielding overhead
Best for files/datasets with 10,000+ records
Note
Single dict input not supported in streaming mode. Convert to list first.
Module contents#
Data loading and validation module.
- pain001.data.load_payment_data(data_source: str | list[dict[str, Any]] | dict[str, Any]) list[dict[str, Any]][source]#
Universal data loader supporting multiple input sources.
This function provides a unified interface for loading payment data from various sources while maintaining backward compatibility with existing file-based workflows.
- Parameters:
data_source –
The payment data source. Supports: - str: File path to CSV (.csv), SQLite (.db), JSON (.json/.jsonl),
or Parquet (.parquet) file
list: List of dictionaries with payment data
dict: Single payment transaction as dictionary
- Returns:
List of payment data dictionaries
- Return type:
List[Dict[str, Any]]
- Raises:
ValueError – If data source type is unsupported or data is invalid
FileNotFoundError – If file path doesn’t exist
Examples
# Existing file-based usage (backward compatible) >>> data = load_payment_data(‘payments.csv’) >>> data = load_payment_data(‘payments.db’)
# New JSON formats >>> data = load_payment_data(‘payments.json’) >>> data = load_payment_data(‘payments.jsonl’) # JSON Lines
# New Parquet format (requires pyarrow) >>> data = load_payment_data(‘payments.parquet’)
# New direct Python data usage >>> data = load_payment_data([ … {‘id’: ‘MSG001’, ‘amount’: ‘1000.00’, …}, … {‘id’: ‘MSG002’, ‘amount’: ‘500.00’, …} … ])
# Single transaction >>> data = load_payment_data({ … ‘id’: ‘MSG001’, ‘amount’: ‘1000.00’, … … })