Schema¶
Schema primitives for defining structured tabular data.
This module provides a Schema abstraction that declares columns via
Column instances as class attributes. The legacy _DATATYPES dict is
still accepted but emits a DeprecationWarning; migrate to Column
declarations to silence it.
- class algomancy_data.schema.DataType(*values)[source]¶
Bases:
StrEnumEnumeration of supported logical data types for schema fields.
- STRING = 'string'¶
- DATETIME = 'datetime64[ns]'¶
- INTEGER = 'int64'¶
- FLOAT = 'float64'¶
- BOOLEAN = 'boolean'¶
- CATEGORICAL = 'categorical'¶
- INTERVAL = 'interval'¶
- class algomancy_data.schema.FileExtension(*values)[source]¶
Bases:
StrEnumSupported file extensions for input files.
- CSV = 'csv'¶
- XLSX = 'xlsx'¶
- JSON = 'json'¶
- class algomancy_data.schema.SchemaType(*values)[source]¶
Bases:
StrEnumEnumeration of supported schema types.
- SINGLE = 'single'¶
- MULTI = 'multi'¶
- class algomancy_data.schema.Column(name, dtype, optional=False, primary_key=False, default=None, nullable=False, unique=False, description='', foreign_key=None, parent_requires_child=False, track_partial_loss=False)[source]¶
Bases:
objectMetadata for a single schema column.
- Parameters:
name (str) – Actual column name as it appears in the source data.
dtype (DataType) – The expected
DataTypeof this column.optional (bool) – If
Truethe column may be absent in the source data.primary_key (bool) – If
Truethis column is part of the (joint) primary key.default (Any) – Value used when the column is absent and
optional=True.nullable (bool) – If
Truethe column may contain null/NaN values.unique (bool) – If
Trueall values in the column must be distinct.description (str) – Human-readable description of the column.
foreign_key (Tuple[str, str] | None) – Optional
(parent_table, parent_column)tuple declaring that this column references a column on another table. Used byForeignKeyValidator(for reporting violations) and byCascadeDropTransformer(for cascade cleanup).parent_requires_child (bool) – If
True, the referenced parent row requires at least one referencing child on this relation; parents with zero children get dropped byCascadeDropTransformer. Only meaningful whenforeign_keyis set.track_partial_loss (bool) – If
True, enables partial-loss cascade for this relation: parents that lose some (but not all) of their children mid-pipeline are dropped. Requires aCascadeSnapshotpaired with the cascade transformer. Only meaningful whenforeign_keyis set.
- name: str¶
- optional: bool = False¶
- primary_key: bool = False¶
- default: Any = None¶
- nullable: bool = False¶
- unique: bool = False¶
- description: str = ''¶
- foreign_key: Tuple[str, str] | None = None¶
- parent_requires_child: bool = False¶
- track_partial_loss: bool = False¶
- class algomancy_data.schema.ColumnGroup(name, columns, source_path=<factory>)[source]¶
Bases:
objectMetadata for one sheet (sub-schema) of a MULTI schema.
Declare
ColumnGroupinstances as class attributes on aSchemasubclass with_SCHEMA_TYPE = SchemaType.MULTI:class LocationSchema(Schema): _FILENAME = "multisheet" _EXTENSION = FileExtension.XLSX _SCHEMA_TYPE = SchemaType.MULTI STEDEN = ColumnGroup("Steden", [ Column("Country", dtype=DataType.STRING), Column("City", dtype=DataType.STRING), ]) KLANTEN = ColumnGroup("Klanten", [ Column("ID", dtype=DataType.INTEGER, primary_key=True), Column("Naam", dtype=DataType.STRING), ])
- Parameters:
name (str) – Actual sheet / sub-schema name as it appears in the source file (may contain spaces and mixed case).
columns (List[Column]) – Ordered list of
Columnobjects for this sub-schema.source_path (Tuple[str, ...]) – For nested sources (e.g. JSON), the path of keys from the root record to the list of dicts that populates this group.
()(the default) means the group is built from the root record itself; a tuple like("PickOrderLines",)means each root record has a nested list at that key whose elements form the rows of this group. Ignored by extractors that do not support nesting (e.g.XLSXMultiExtractor).
- name: str¶
- source_path: Tuple[str, ...]¶
- class algomancy_data.schema.Schema[source]¶
Bases:
ABCAbstract base class for table schemas.
Declare columns as class attributes using
Columninstances:class MySchema(Schema): _FILENAME = "my_file" _EXTENSION = FileExtension.CSV _SCHEMA_TYPE = SchemaType.SINGLE ID = Column("id", dtype=DataType.STRING, primary_key=True) NAME = Column("name", dtype=DataType.STRING) VALUE = Column("value", dtype=DataType.FLOAT, optional=True)
The legacy
_DATATYPESdict is still supported but deprecated.- classmethod extension()[source]¶
Return the file extension.
Accepts any
StrEnum-derived value (including user-definedFileExtensionsubclasses created for custom file formats — see Extending file types and data types). A plainstris upcast to the built-inFileExtensionfor compatibility, or returned as-is when it does not match a built-in value.
- classmethod columns()[source]¶
Return an ordered mapping of column name →
Column.For schemas that declare
Columnclass attributes the mapping is built from those attributes (in class-definition order).For schemas that still use the legacy
_DATATYPESdict aDeprecationWarningis emitted andColumnobjects are built automatically withoptional=False,primary_key=False, anddefault=None.- Raises:
NotImplementedError – If neither Column attributes nor
_DATATYPESare defined.TypeError – If called on a MULTI schema (use
datatype_groups()).
- classmethod column_groups()[source]¶
Return
{group_name: {col_name: Column}}for MULTI schemas.Scans
vars(cls)forColumnGroupattributes first (new API). Falls back to_DATATYPESfor legacy schemas, emitting aDeprecationWarningand constructing bareColumnobjects (optional=False,primary_key=False,default=None).- Raises:
ValueError – If called on a SINGLE schema.
NotImplementedError – If neither ColumnGroup attrs nor
_DATATYPESare defined.
- classmethod get_subschema(key)[source]¶
Return a synthetic SINGLE schema class for one sheet of a MULTI schema.
The returned class behaves as a normal
Schemasubclass and exposesdatatypes()for the requested sub-name.- Parameters:
key (str) – Sub-schema name (e.g. sheet name in an XLSX file).
- Raises:
ValueError – If called on a SINGLE schema or if
keyis invalid.