DataManager

The DataManager class is responsible for data ingestion and internal storage. It is usually not accessed directly, but rather through the ScenarioManager facade. Three concrete implementations are available:

  • StatelessDataManager — in-memory only; no persistence.

  • StatefulDataManager — persists DataSources to disk as JSON files and reloads them on startup. Deprecated — use DatabaseDataManager for new projects.

  • DatabaseDataManager — persists DataSources to a SQL database (requires the [database] extra).

StatelessDataManager / StatefulDataManager

class algomancy_data.datamanager.DataManager(etl_factory, schemas, save_type, data_object_type, logger=None)[source]

Bases: ABC

Handles all data-related operations: loading, deriving, deleting, and storing datasets.

property data_object_type
abstractmethod startup()[source]
log(message)[source]
get_data_keys()[source]
get_data(data_key)[source]
set_data(data_key, data)[source]
derive_data(existing_key, derived_key)[source]
add_data_source(data_source)[source]
abstractmethod delete_data(data_key, prevent_masterdata_removal=False)[source]
static check_existence_of_files(file_name_to_path)[source]
prepare_files(file_items_with_content=None, file_items_with_path=None)[source]
etl_data(files, dataset_name)[source]

Run the ETL pipeline for dataset_name and store the result.

Parameters:
  • files (Dict[str, File]) – Mapping of logical file names to File objects.

  • dataset_name (str) – Logical name for the resulting dataset.

Returns:

structured outcome. Inspect result.status to tell success from failure and result.validation_result.messages for details.

Return type:

ETLResult

Raises:
  • ETLConstructionError – If pipeline construction fails.

  • Exception – Programmer errors from user-supplied components are allowed to propagate unchanged.

create_validation_sequence()[source]
class algomancy_data.datamanager.StatelessDataManager(etl_factory, schemas, save_type, data_object_type, logger=None)[source]

Bases: DataManager

startup()[source]
delete_data(data_key, prevent_masterdata_removal=False)[source]
class algomancy_data.datamanager.StatefulDataManager(etl_factory, schemas, data_folder, save_type, data_object_type, logger=None)[source]

Bases: DataManager

startup()[source]

Load persisted data sources from the data folder.

Each item is loaded independently; if a single file/directory fails to load it is logged and skipped, and any partial in-memory state for that item is rolled back so the manager remains consistent. Other items continue to load. Failures are surfaced through the configured logger; self.startup_errors collects them so callers can inspect what happened.

load_data_from_file(file_name, root=None)[source]
load_data_from_dir(directory, root=None)[source]
delete_data(data_key, prevent_masterdata_removal=False)[source]
store_data(dataset_name, data, USE_OLD_VERSION=True)[source]
store_data_source_as_json(dataset_name, allow_overwrite=False)[source]

DatabaseDataManager

DatabaseDataManager stores DataSources in a SQL database (SQLite by default; Postgres-compatible). Writes happen immediately after every ETL run, derive, or add_data_source call. get_data() loads a DataSource into RAM on first access and caches it, so only accessed datasets occupy memory.

Persistence path selection is dispatched automatically per DataSource:

  • Per-sub-table SQL (used when the subclass implements SqlTableLayout) — each DataFrame becomes a dedicated SQL table (ds__<session>__<name>__<sub>), externally queryable.

  • JSON-blob fallback (used for all other BaseDataSource subclasses) — the DataSource is serialised via its abstract to_json() into a payload column on the algomancy_datasets catalogue table.

The bundled DataSource satisfies SqlTableLayout via its tables dict, so it is always stored as real SQL tables.

Schema drift — if an older database is missing the payload column, startup() raises immediately with a clear message directing you to drop the catalogue table and rebuild (there is no automatic migration).

Requires sqlalchemy>=2.0. Install via:

pip install algomancy-data[database]