Migration guide

The 0.6 → 0.8 versions delivered three coordinated overhauls of the ETL machinery. This page lists the breaking changes together with the minimal before/after snippets you need to migrate.

v0.6.0 — Schema API modernization

_DATATYPESColumn instances

The new declarative Column carries dtype together with optional metadata (optional, primary_key, default, nullable, unique, description). The legacy _DATATYPES dict still works but emits a DeprecationWarning via Schema.columns().

Before — v0.5
class OrdersSchema(Schema):
    _FILENAME = "orders"
    _EXTENSION = FileExtension.CSV
    _SCHEMA_TYPE = SchemaType.SINGLE

    _DATATYPES = {
        "id": DataType.STRING,
        "qty": DataType.INTEGER,
    }
After — v0.6
class OrdersSchema(Schema):
    _FILENAME = "orders"
    _EXTENSION = FileExtension.CSV
    _SCHEMA_TYPE = SchemaType.SINGLE

    ID = Column(name="id", dtype=DataType.STRING, primary_key=True)
    QTY = Column(name="qty", dtype=DataType.INTEGER)

Classmethod identity accessors

All schema-level accessors are now @classmethod, so the call form gains parentheses:

Before
schema.file_name
schema.extension
schema.datatypes
After
schema.file_name()
schema.extension()
schema.datatypes()

get_subschema(key) now returns a synthetic schema class, not an instance — call its classmethods directly.

v0.7.0 — Structured validation framework

ValidationMessage: structured location fields

Positional construction (severity, message) still works. New optional keyword fields (table, column, row, code) make messages machine-readable for downstream rendering.

Before
msg = ValidationMessage(ValidationSeverity.ERROR, "bad row 42 in widgets.price")
After
msg = ValidationMessage(
    ValidationSeverity.ERROR,
    "bad row",
    table="widgets",
    column="price",
    row=42,
    code="DTYPE_MISMATCH",
)

ValidationSequence.run_validation()ValidationResult

Before
is_valid, messages = sequence.run_validation(data)
After
result = sequence.run_validation(data)
result.is_valid
result.messages
result.counts_by_severity
result.as_dataframe()

Configurable halt threshold

sequence = ValidationSequence(
    [...],
    halt_on=ValidationSeverity.ERROR,  # default is CRITICAL
)

New built-in validators

Validator

Replaces ad-hoc check

RequiredColumnsValidator

manual “is column X here?” checks

PrimaryKeyValidator

per-project uniqueness/non-null checks

UniqueValueValidator / MissingValueValidator

per-column checks

ForeignKeyValidator (M5)

per-project FK checks

The OptionalColumnGuard transformer (which injects missing optional columns using Column.default) replaces manual df[col] = default lines.

v0.8.0 — Predictable ETL termination

ETLPipeline.run() returns ETLResult (no longer raises)

Data-quality failures (validation, missing/malformed files, dtype conversion errors) arrive as ETLResult(status='failed'). Programmer errors (e.g. KeyError from a custom transformer) still propagate.

Before
try:
    datasource = pipeline.run()
except ValidationError as exc:
    report(exc)
After
result = pipeline.run()
if result.is_success:
    use(result.datasource)
else:
    report(result.validation_result)
    if result.raised is not None:
        # Expected ETL exception (e.g. FileNotFoundError) was caught
        # and converted; the original is preserved here.
        ...

DataManager.etl_data() returns the result

Before
dm.etl_data(files, "orders_2026")  # raised on failure
After
result = dm.etl_data(files, "orders_2026")
if result.is_failure:
    show_messages_to_user(result.validation_result.messages)

Conversion failures surface as validation messages

DataTypeConverter no longer prints + swallows coercion errors; they arrive on the final ValidationResult as messages with code="CONVERSION_FAILED", populated table/column/row.

Bonus: M4 boilerplate reductions

These are not breaking changes — old subclasses keep working — but you can now delete a lot of plumbing:

  • SimpleETLFactory(schemas) replaces full ETLFactory subclasses for the common case.

  • ETLFactory ships with default create_extraction_sequence / create_validation_sequence / create_transformation_sequence / create_loader implementations; only override the ones you need.

  • DataManager.prepare_files now drives file-type dispatch off the schema-declared _EXTENSION.

See Extending file types and data types for the public register_extractor API introduced in M5.

Bonus: M7 relational cascade cleanup

M7 is fully additive — no existing pipeline changes behavior unless you add the new transformer. Three new optional Column fields and two new transformers let you declaratively clean up incomplete input data.

Adding FK declarations to existing schemas

Before — relations are implicit
class OrderSchema(Schema):
    _FILENAME = "order"
    _EXTENSION = FileExtension.CSV
    _SCHEMA_TYPE = SchemaType.SINGLE

    ID = Column(name="id", dtype=DataType.STRING, primary_key=True)
    PRODUCT_ID = Column(name="product_id", dtype=DataType.STRING)
After — relations declared on the FK column
class OrderSchema(Schema):
    _FILENAME = "order"
    _EXTENSION = FileExtension.CSV
    _SCHEMA_TYPE = SchemaType.SINGLE

    ID = Column(name="id", dtype=DataType.STRING, primary_key=True)
    PRODUCT_ID = Column(
        name="product_id",
        dtype=DataType.STRING,
        foreign_key=("product", "id"),
        parent_requires_child=True,   # opt-in
    )

Wiring CascadeDropTransformer into a SimpleETLFactory

Opt-in cleanup
from algomancy_data import CascadeDropTransformer, SimpleETLFactory

factory = SimpleETLFactory(
    schemas=[ProductSchema, OrderSchema],
    transformers=[CascadeDropTransformer(schemas=[ProductSchema, OrderSchema])],
)

See Relational cascade cleanup for the full feature description, including partial-loss detection via CascadeSnapshot.