DataManager¶
The DataManager class is responsible for data ingestion and internal storage. It is usually not accessed directly, but rather through the ScenarioManager facade. Three concrete implementations are available:
StatelessDataManager— in-memory only; no persistence.StatefulDataManager— persists DataSources to disk as JSON files and reloads them on startup. Deprecated — useDatabaseDataManagerfor new projects.DatabaseDataManager— persists DataSources to a SQL database (requires the[database]extra).
StatelessDataManager / StatefulDataManager¶
- class algomancy_data.datamanager.DataManager(etl_factory, schemas, save_type, data_object_type, logger=None)[source]¶
Bases:
ABCHandles all data-related operations: loading, deriving, deleting, and storing datasets.
- property data_object_type¶
- etl_data(files, dataset_name)[source]¶
Run the ETL pipeline for
dataset_nameand store the result.- Parameters:
files (Dict[str, File]) – Mapping of logical file names to
Fileobjects.dataset_name (str) – Logical name for the resulting dataset.
- Returns:
structured outcome. Inspect
result.statusto tell success from failure andresult.validation_result.messagesfor details.- Return type:
- Raises:
ETLConstructionError – If pipeline construction fails.
Exception – Programmer errors from user-supplied components are allowed to propagate unchanged.
- class algomancy_data.datamanager.StatelessDataManager(etl_factory, schemas, save_type, data_object_type, logger=None)[source]¶
Bases:
DataManager
- class algomancy_data.datamanager.StatefulDataManager(etl_factory, schemas, data_folder, save_type, data_object_type, logger=None)[source]¶
Bases:
DataManager- startup()[source]¶
Load persisted data sources from the data folder.
Each item is loaded independently; if a single file/directory fails to load it is logged and skipped, and any partial in-memory state for that item is rolled back so the manager remains consistent. Other items continue to load. Failures are surfaced through the configured logger;
self.startup_errorscollects them so callers can inspect what happened.
DatabaseDataManager¶
DatabaseDataManager stores DataSources in a SQL database (SQLite by default;
Postgres-compatible). Writes happen immediately after every ETL run, derive, or
add_data_source call. get_data() loads a DataSource into RAM on first
access and caches it, so only accessed datasets occupy memory.
Persistence path selection is dispatched automatically per DataSource:
Per-sub-table SQL (used when the subclass implements SqlTableLayout) — each DataFrame becomes a dedicated SQL table (
ds__<session>__<name>__<sub>), externally queryable.JSON-blob fallback (used for all other
BaseDataSourcesubclasses) — the DataSource is serialised via its abstractto_json()into apayloadcolumn on thealgomancy_datasetscatalogue table.
The bundled DataSource satisfies SqlTableLayout via its tables dict, so
it is always stored as real SQL tables.
Schema drift — if an older database is missing the payload column,
startup() raises immediately with a clear message directing you to drop the
catalogue table and rebuild (there is no automatic migration).
Requires sqlalchemy>=2.0. Install via:
pip install algomancy-data[database]