(migration-ref)=
# Migration guide

The 0.6 → 0.8 versions delivered three coordinated overhauls of the ETL
machinery. This page lists the breaking changes together with the
minimal before/after snippets you need to migrate.

## v0.6.0 — Schema API modernization

### `_DATATYPES` → `Column` instances

The new declarative `Column` carries dtype together with optional
metadata (`optional`, `primary_key`, `default`, `nullable`, `unique`,
`description`). The legacy `_DATATYPES` dict still works but emits a
`DeprecationWarning` via `Schema.columns()`.

```{code-block} python
:caption: Before — v0.5
class OrdersSchema(Schema):
    _FILENAME = "orders"
    _EXTENSION = FileExtension.CSV
    _SCHEMA_TYPE = SchemaType.SINGLE

    _DATATYPES = {
        "id": DataType.STRING,
        "qty": DataType.INTEGER,
    }
```

```{code-block} python
:caption: After — v0.6
class OrdersSchema(Schema):
    _FILENAME = "orders"
    _EXTENSION = FileExtension.CSV
    _SCHEMA_TYPE = SchemaType.SINGLE

    ID = Column(name="id", dtype=DataType.STRING, primary_key=True)
    QTY = Column(name="qty", dtype=DataType.INTEGER)
```

### Classmethod identity accessors

All schema-level accessors are now `@classmethod`, so the call form
gains parentheses:

```{code-block} python
:caption: Before
schema.file_name
schema.extension
schema.datatypes
```

```{code-block} python
:caption: After
schema.file_name()
schema.extension()
schema.datatypes()
```

`get_subschema(key)` now returns a synthetic schema **class**, not an
instance — call its classmethods directly.

## v0.7.0 — Structured validation framework

### `ValidationMessage`: structured location fields

Positional construction `(severity, message)` still works. New optional
keyword fields (`table`, `column`, `row`, `code`) make messages
machine-readable for downstream rendering.

```{code-block} python
:caption: Before
msg = ValidationMessage(ValidationSeverity.ERROR, "bad row 42 in widgets.price")
```

```{code-block} python
:caption: After
msg = ValidationMessage(
    ValidationSeverity.ERROR,
    "bad row",
    table="widgets",
    column="price",
    row=42,
    code="DTYPE_MISMATCH",
)
```

### `ValidationSequence.run_validation()` → `ValidationResult`

```{code-block} python
:caption: Before
is_valid, messages = sequence.run_validation(data)
```

```{code-block} python
:caption: After
result = sequence.run_validation(data)
result.is_valid
result.messages
result.counts_by_severity
result.as_dataframe()
```

### Configurable halt threshold

```{code-block} python
sequence = ValidationSequence(
    [...],
    halt_on=ValidationSeverity.ERROR,  # default is CRITICAL
)
```

### New built-in validators

| Validator | Replaces ad-hoc check |
|---|---|
| `RequiredColumnsValidator` | manual "is column X here?" checks |
| `PrimaryKeyValidator` | per-project uniqueness/non-null checks |
| `UniqueValueValidator` / `MissingValueValidator` | per-column checks |
| `ForeignKeyValidator` (M5) | per-project FK checks |

The `OptionalColumnGuard` transformer (which injects missing optional
columns using `Column.default`) replaces manual `df[col] = default` lines.

## v0.8.0 — Predictable ETL termination

### `ETLPipeline.run()` returns `ETLResult` (no longer raises)

Data-quality failures (validation, missing/malformed files, dtype
conversion errors) arrive as `ETLResult(status='failed')`. Programmer
errors (e.g. `KeyError` from a custom transformer) still propagate.

```{code-block} python
:caption: Before
try:
    datasource = pipeline.run()
except ValidationError as exc:
    report(exc)
```

```{code-block} python
:caption: After
result = pipeline.run()
if result.is_success:
    use(result.datasource)
else:
    report(result.validation_result)
    if result.raised is not None:
        # Expected ETL exception (e.g. FileNotFoundError) was caught
        # and converted; the original is preserved here.
        ...
```

### `DataManager.etl_data()` returns the result

```{code-block} python
:caption: Before
dm.etl_data(files, "orders_2026")  # raised on failure
```

```{code-block} python
:caption: After
result = dm.etl_data(files, "orders_2026")
if result.is_failure:
    show_messages_to_user(result.validation_result.messages)
```

### Conversion failures surface as validation messages

`DataTypeConverter` no longer prints + swallows coercion errors; they
arrive on the final `ValidationResult` as messages with `code="CONVERSION_FAILED"`,
populated `table`/`column`/`row`.

## Bonus: M4 boilerplate reductions

These are not breaking changes — old subclasses keep working — but you
can now delete a lot of plumbing:

- `SimpleETLFactory(schemas)` replaces full `ETLFactory` subclasses for the common case.
- `ETLFactory` ships with default `create_extraction_sequence` / `create_validation_sequence` / `create_transformation_sequence` / `create_loader` implementations; only override the ones you need.
- `DataManager.prepare_files` now drives file-type dispatch off the schema-declared `_EXTENSION`.

See [Extending file types and data types](extending-ref) for the public
`register_extractor` API introduced in M5.

## Bonus: M7 relational cascade cleanup

M7 is **fully additive** — no existing pipeline changes behavior unless
you add the new transformer. Three new optional `Column` fields and two
new transformers let you declaratively clean up incomplete input data.

### Adding FK declarations to existing schemas

```{code-block} python
:caption: Before — relations are implicit
class OrderSchema(Schema):
    _FILENAME = "order"
    _EXTENSION = FileExtension.CSV
    _SCHEMA_TYPE = SchemaType.SINGLE

    ID = Column(name="id", dtype=DataType.STRING, primary_key=True)
    PRODUCT_ID = Column(name="product_id", dtype=DataType.STRING)
```

```{code-block} python
:caption: After — relations declared on the FK column
class OrderSchema(Schema):
    _FILENAME = "order"
    _EXTENSION = FileExtension.CSV
    _SCHEMA_TYPE = SchemaType.SINGLE

    ID = Column(name="id", dtype=DataType.STRING, primary_key=True)
    PRODUCT_ID = Column(
        name="product_id",
        dtype=DataType.STRING,
        foreign_key=("product", "id"),
        parent_requires_child=True,   # opt-in
    )
```

### Wiring `CascadeDropTransformer` into a `SimpleETLFactory`

```{code-block} python
:caption: Opt-in cleanup
from algomancy_data import CascadeDropTransformer, SimpleETLFactory

factory = SimpleETLFactory(
    schemas=[ProductSchema, OrderSchema],
    transformers=[CascadeDropTransformer(schemas=[ProductSchema, OrderSchema])],
)
```

See [Relational cascade cleanup](cascade-cleanup-ref) for the full feature
description, including partial-loss detection via `CascadeSnapshot`.