Validator

Validation primitives for ETL data quality checks.

Provides a small framework for validating extracted data prior to loading. It includes a Validator base class, several concrete validators, a ValidationSequence to compose multiple validators, and the structured ValidationResult object returned from a validation run.

class algomancy_data.validator.ValidationSeverity(*values)[source]

Bases: StrEnum

Severity levels used in validation messages.

INFO = 'INFO'
WARNING = 'WARNING'
ERROR = 'ERROR'
CRITICAL = 'CRITICAL'
exception algomancy_data.validator.ValidationError(message='Validation failed.', context=None)[source]

Bases: Exception

Exception raised for validation errors in the data pipeline.

Retained for backwards-compatibility. The modern flow (ETLPipeline.run returning ETLResult) no longer raises this exception for data-quality failures; callers should inspect ETLResult.validation_result instead.

message

Explanation of the error.

context

Optional dictionary or object with additional context.

class algomancy_data.validator.ValidationMessage(severity, message, table=None, column=None, row=None, code=None)[source]

Bases: object

Container for a validation outcome with optional structured location.

severity
message
table
column
row
code
static clean(message)[source]

Normalize message by escaping newlines/tabs for single-line logs.

to_dict()[source]
class algomancy_data.validator.ValidationResult(is_valid, messages=<factory>, halt_on=ValidationSeverity.CRITICAL, counts_by_severity=<factory>)[source]

Bases: object

Structured outcome of a ValidationSequence run.

is_valid

True if no message met or exceeded the halt threshold.

Type:

bool

messages

All messages collected during the run.

Type:

List[algomancy_data.validator.ValidationMessage]

halt_on

Severity threshold that determined is_valid.

Type:

algomancy_data.validator.ValidationSeverity

counts_by_severity

Count of messages per severity level.

Type:

Dict[str, int]

is_valid: bool
messages: List[ValidationMessage]
halt_on: ValidationSeverity = 'CRITICAL'
counts_by_severity: Dict[str, int]
messages_by_severity(severity)[source]

Return all messages matching severity.

messages_at_least(severity)[source]

Return all messages with severity >= severity.

as_dataframe()[source]

Render messages as a pandas DataFrame for display/inspection.

class algomancy_data.validator.Validator[source]

Bases: ABC

Abstract validator that appends messages during validate.

property messages: List[ValidationMessage]
add_message(severity, message, table=None, column=None, row=None, code=None)[source]
buffer_message(severity, message, table=None, column=None, row=None, code=None)[source]
flush_buffer(success_message=None)[source]

Move buffered messages into the main list; add optional success note.

abstractmethod validate(data)[source]

Validate the provided data and return collected messages.

class algomancy_data.validator.DefaultValidator[source]

Bases: Validator

No-op validator that always returns a single success INFO message.

validate(data)[source]

Validate the provided data and return collected messages.

class algomancy_data.validator.ExtractionSuccessVerification[source]

Bases: Validator

Validator that ensures extracted DataFrames are not empty.

validate(data)[source]

Validate the provided data and return collected messages.

class algomancy_data.validator.SchemaValidator(schemas=None, severity=ValidationSeverity.ERROR)[source]

Bases: Validator

Validate DataFrames against a list of Schema declarations.

Checks each known table for unexpected columns and dtype mismatches.

_schemas

Mapping of file name → Schema (or subschema).

_severity

Severity used for column/schema mismatches.

validate(data)[source]

Validate the provided data and return collected messages.

class algomancy_data.validator.RequiredColumnsValidator(schemas, severity=ValidationSeverity.ERROR)[source]

Bases: Validator

Fail when a schema’s required columns are missing from the extracted data.

Emits one structured message per missing column with table and column populated.

_schemas

Schemas to enforce.

_severity

Severity used for missing-column reports.

validate(data)[source]

Validate the provided data and return collected messages.

class algomancy_data.validator.PrimaryKeyValidator(schemas, severity=ValidationSeverity.ERROR)[source]

Bases: Validator

Enforce uniqueness and non-null over each schema’s primary key.

Supports joint primary keys. Skips schemas with no declared primary key.

_schemas

Schemas to enforce primary-key constraints for.

_severity

Severity used when violations are detected.

validate(data)[source]

Validate the provided data and return collected messages.

class algomancy_data.validator.UniqueValueValidator(table, columns, severity=ValidationSeverity.ERROR)[source]

Bases: Validator

Flag duplicate values within one or more columns of a single table.

Each column is checked independently (not as a composite key).

table

Table name to inspect.

columns

Column names whose values must be unique.

severity

Severity used for violations.

validate(data)[source]

Validate the provided data and return collected messages.

class algomancy_data.validator.MissingValueValidator(table, columns, severity=ValidationSeverity.ERROR)[source]

Bases: Validator

Flag null cells in columns that are declared non-nullable.

table

Table name to inspect.

columns

Column names that must not be null.

severity

Severity used for violations.

validate(data)[source]

Validate the provided data and return collected messages.

class algomancy_data.validator.ForeignKeyValidator(left_table, left_col, right_table, right_col, severity=ValidationSeverity.ERROR)[source]

Bases: Validator

Cross-table integrity check.

Verifies that every (non-null) value of left_table[left_col] exists in right_table[right_col]. Supports composite keys when left_col and right_col are lists of equal length.

left_table

Table that holds the foreign key values.

left_col

Column name (or list of names) on the left side.

right_table

Table that holds the referenced values.

right_col

Column name (or list of names) on the right side.

severity

Severity used when a value is not found.

left_col: List[str]

Column name (or list of names) on the left side.

right_col: List[str]

Column name (or list of names) on the right side.

validate(data)[source]

Validate the provided data and return collected messages.

classmethod from_schemas(schemas, severity=ValidationSeverity.ERROR)[source]

Build a list of validators from Column.foreign_key declarations.

Walks each schema’s columns; for every column with a non-null foreign_key declaration, returns a ForeignKeyValidator instance covering that relation. Columns sharing the same parent table on the same schema are collapsed into a single composite-key validator.

Parameters:
  • schemas (Iterable[type]) – Iterable of Schema subclasses.

  • severity (ValidationSeverity) – Severity for emitted FK-violation messages.

Returns:

List of ForeignKeyValidator instances, one per derived relation. The list is empty if no schema declares a FK.

Return type:

List[ForeignKeyValidator]

class algomancy_data.validator.ValidationSequence(validators=None, logger=None, halt_on=ValidationSeverity.CRITICAL)[source]

Bases: object

A sequence of validators executed in order with message aggregation.

halt_on

Severity at or above which the run is considered invalid. Defaults to ValidationSeverity.CRITICAL.

property is_valid: bool

Return True when completed and no message met halt_on threshold.

property messages: List[ValidationMessage]
property completed: bool
run_validation(data)[source]

Execute validators, collect messages, and return a ValidationResult.

add_validators(validators)[source]

Append multiple validators to the sequence.

add_validator(validator)[source]

Append a single validator to the sequence.