API Reference¶

MELITE exposes an intended public API through five symbols. The project is pre-stable, so this API may change before 1.0. Internal modules are importable directly but are not part of the public contract.

from melite import Config
from melite import load_datasets
from melite import plot_cv_distributions
from melite import predict
from melite import __version__

Config¶

Configuration container for MELITE.

Loads defaults from melite/config_default.toml. If user_config is provided, its values are merged over the defaults — user values win and missing keys fall back to defaults.

Parameters:

Name	Type	Description	Default
`smoke`	`bool`	If `True`, use reduced CV settings and single-value hyperparameter grids for lightweight runs. Default is `False`.	`False`
`user_config`	`Path or None`	Path to a user-supplied TOML file. Only the keys present in this file override the defaults. Default is `None`.	`None`

Attributes:

Name	Type	Description
`SMOKE`	`bool`	Whether the instance was created in smoke mode.
`PATHS`	`dict`	Dictionary with keys `"INPUT"`, `"DATASET"`, and `"OUTPUT"` mapping to the corresponding directory paths as strings.
`RESULTS_FILE`	`str`	Full path to the TXT results file (`output/results.txt` by default).
`RANDOM_STATE`	`int`	Global random seed. Default is `42`.
`REDUCTION_TYPES`	`list of str`	Reduction methods to benchmark (e.g. `["PCA", "UMAP"]`).
`REDUCTION_LEVELS`	`list of int`	Variance retention levels to benchmark (e.g. `[70, 75, 80, 85, 90, 95]`).
`DATASETS`	`dict`	Normalized dataset registry keyed by user-defined dataset id. Each entry contains `path`, `label_path`, and `metadata` keys.
`ACTIVE_MODELS`	`list of str`	Model keys to include in the benchmark (e.g. `["svc", "rf", "xgb"]`; add `"stack"` to opt in to experimental stacking).
`CV_CONFIG`	`dict`	Cross-validation settings with keys `n_splits`, `n_repeats`, and `random_state`.
`PARAM_GRID`	`list of dict`	Raw hyperparameter grid definitions, one entry per model configuration.
`PARAM_GRID_BY_MODEL`	`dict`	Compiled :class:`~sklearn.model_selection.ParameterGrid` objects keyed by model name (`"svc"`, `"rf"`, `"xgb"`, `"stack"`).

Examples:

Default configuration:

>>> cfg = Config()
>>> cfg.RANDOM_STATE
42

Smoke mode with a user override:

>>> cfg = Config(smoke=True, user_config=Path("my_config.toml"))
>>> cfg.CV_CONFIG["n_splits"]
3

`get_cv_config()` ¶

Return the cross-validation configuration dictionary.

Returns:

Type	Description
`dict`	Dictionary with keys `n_splits`, `n_repeats`, and `random_state`.

`get_param_grid(model)` ¶

Return the compiled hyperparameter grid for a given model.

Parameters:

Name	Type	Description	Default
`model`	`str`	Model key. One of `"svc"`, `"rf"`, `"xgb"`, or `"stack"`.	required

Returns:

Type	Description
`ParameterGrid`	Iterable of hyperparameter combinations for the requested model.

`setup()` ¶

Create output directories and set random seeds.

This method must be called once from the pipeline entry point before any data is loaded or models are trained. It is intentionally separated from __init__ so that :class:Config can be instantiated in tests without creating directories or modifying global random state.

Notes

Directories are created with exist_ok=True, so calling setup multiple times is safe.

load_datasets¶

Load all datasets from config.DATASETS.

Returns:

Type	Description
`dict`	Mapping of dataset id to dictionaries with `X`, `y`, and `metadata` keys. Dataset ids are user-defined identifiers and are not interpreted as method names.

plot_cv_distributions¶

Generate and optionally save a three-panel CV metric distribution plot.

Creates a figure with one panel per metric (F1, Accuracy, AUC-ROC). Each panel shows a box plot overlaid with jittered scatter points representing individual cross-validation fold scores. If auc is None, the AUC-ROC panel is hidden.

Parameters:

Name	Type	Description	Default
`f1`	`iterable of float`	F1-macro scores from each cross-validation fold.	required
`acc`	`iterable of float`	Accuracy scores from each cross-validation fold.	required
`auc`	`iterable of float or None`	AUC-ROC scores from each cross-validation fold. Pass `None` to hide the AUC-ROC panel (e.g. for binary classifiers without probability support).	required
`model_name`	`str`	Model name shown in the figure title (e.g. `"SVC"`).	required
`params`	`str`	Serialised hyperparameter string shown in the figure subtitle.	required
`save_to`	`Path or None`	Destination path for the PNG file. Parent directories are created automatically if they do not exist. If `None`, the figure is displayed interactively via :func:`matplotlib.pyplot.show`. Default is `None`.	`None`

Notes

When save_to is provided, the figure is saved at 300 DPI with bbox_inches="tight" and the directory tree is created automatically. The function does not close the figure after saving; callers are responsible for calling :func:matplotlib.pyplot.close if needed.

Examples:

Save a plot for an SVC model to a nested directory:

>>> from pathlib import Path
>>> from melite import plot_cv_distributions
>>> f1  = [0.76, 0.90, 0.82]
>>> acc = [0.77, 0.90, 0.82]
>>> auc = [0.83, 0.95, 0.89]
>>> plot_cv_distributions(
...     f1, acc, auc,
...     model_name="SVC",
...     params="{'kernel': 'linear', 'C': 1}",
...     save_to=Path("output/figures/SVC_PCA70.png"),
... )

predict¶

Load a MELITE model artifact and run inference on new data.

Parameters:

Name	Type	Description	Default
`model_path`	`str or Path`	Path to a `.pkl` file produced by `melite export`.	required
`X`	`ndarray`	Feature matrix of shape `(n_samples, n_features)`. Must be a 2-D array and should use the same reduction method and level as the training data (e.g. PCA70 → 37 features).	required
`return_proba`	`bool`	If `True` (default) and the loaded model exposes a `predict_proba` method, class probabilities are computed and included in the output. If `False`, or if the model does not support probability estimates, `probabilities` is `None`.	`True`

Returns:

Type Description

dict

Dictionary with the following keys:

"predictions" : :class:numpy.ndarray, shape (n_samples,) — predicted class labels.
"probabilities" : :class:numpy.ndarray or None, shape (n_samples, n_classes) — class probability estimates, or None if not available or not requested.
"model_path" : str — resolved path to the loaded model file.
"n_samples" : int — number of samples in X.

Raises:

Type	Description
`FileNotFoundError`	If model_path does not exist. The error message includes the path and a hint to run `melite export` first.
`ValueError`	If X is not a 2-D numpy array.

Notes

The .pkl artifacts produced by melite export are serialised with :func:joblib.dump. All scikit-learn compatible estimators (SVC, RandomForestClassifier, XGBClassifier) are supported.

Examples:

Load a previously exported SVC model and predict on new data:

>>> import numpy as np
>>> from melite import predict
>>> X_new = np.random.rand(10, 37).astype(np.float32)
>>> result = predict("output/Model_SVC_PCA70.pkl", X_new)
>>> result["predictions"].shape
(10,)
>>> result["probabilities"].shape
(10, 2)

Version¶

Package version metadata for MELITE.

This module is the single source of truth for the project version. It is read by hatchling at build time via [tool.hatch.version] and imported by result_manager to stamp generated reports.

API Reference¶

Config¶

get_cv_config() ¶

get_param_grid(model) ¶

setup() ¶

load_datasets¶

plot_cv_distributions¶

predict¶

Version¶

`get_cv_config()` ¶

`get_param_grid(model)` ¶

`setup()` ¶