Skip to content

API Reference

MELITE exposes an intended public API through five symbols. The project is pre-stable, so this API may change before 1.0. Internal modules are importable directly but are not part of the public contract.

from melite import Config
from melite import load_datasets
from melite import plot_cv_distributions
from melite import predict
from melite import __version__

Config

Configuration container for MELITE.

Loads defaults from melite/config_default.toml. If user_config is provided, its values are merged over the defaults — user values win and missing keys fall back to defaults.

Parameters:

Name Type Description Default
smoke bool

If True, use reduced CV settings and single-value hyperparameter grids for lightweight runs. Default is False.

False
user_config Path or None

Path to a user-supplied TOML file. Only the keys present in this file override the defaults. Default is None.

None

Attributes:

Name Type Description
SMOKE bool

Whether the instance was created in smoke mode.

PATHS dict

Dictionary with keys "INPUT", "DATASET", and "OUTPUT" mapping to the corresponding directory paths as strings.

RESULTS_FILE str

Full path to the TXT results file (output/results.txt by default).

RANDOM_STATE int

Global random seed. Default is 42.

REDUCTION_TYPES list of str

Reduction methods to benchmark (e.g. ["PCA", "UMAP"]).

REDUCTION_LEVELS list of int

Variance retention levels to benchmark (e.g. [70, 75, 80, 85, 90, 95]).

DATASETS dict

Normalized dataset registry keyed by user-defined dataset id. Each entry contains path, label_path, and metadata keys.

ACTIVE_MODELS list of str

Model keys to include in the benchmark (e.g. ["svc", "rf", "xgb"]).

CV_CONFIG dict

Cross-validation settings with keys n_splits, n_repeats, and random_state.

PARAM_GRID list of dict

Raw hyperparameter grid definitions, one entry per model configuration.

PARAM_GRID_BY_MODEL dict

Compiled :class:~sklearn.model_selection.ParameterGrid objects keyed by model name ("svc", "rf", "xgb").

Examples:

Default configuration:

>>> cfg = Config()
>>> cfg.RANDOM_STATE
42

Smoke mode with a user override:

>>> cfg = Config(smoke=True, user_config=Path("my_config.toml"))
>>> cfg.CV_CONFIG["n_splits"]
3

get_cv_config()

Return the cross-validation configuration dictionary.

Returns:

Type Description
dict

Dictionary with keys n_splits, n_repeats, and random_state.

get_param_grid(model)

Return the compiled hyperparameter grid for a given model.

Parameters:

Name Type Description Default
model str

Model key. One of "svc", "rf", or "xgb".

required

Returns:

Type Description
ParameterGrid

Iterable of hyperparameter combinations for the requested model.

setup()

Create output directories and set random seeds.

This method must be called once from the pipeline entry point before any data is loaded or models are trained. It is intentionally separated from __init__ so that :class:Config can be instantiated in tests without creating directories or modifying global random state.

Notes

Directories are created with exist_ok=True, so calling setup multiple times is safe.


load_datasets

Load all datasets from config.DATASETS.

Returns:

Type Description
dict

Mapping of dataset id to dictionaries with X, y, and metadata keys. Dataset ids are user-defined identifiers and are not interpreted as method names.


plot_cv_distributions

Generate and optionally save a three-panel CV metric distribution plot.

Creates a figure with one panel per metric (F1, Accuracy, AUC-ROC). Each panel shows a box plot overlaid with jittered scatter points representing individual cross-validation fold scores. If auc is None, the AUC-ROC panel is hidden.

Parameters:

Name Type Description Default
f1 iterable of float

F1-macro scores from each cross-validation fold.

required
acc iterable of float

Accuracy scores from each cross-validation fold.

required
auc iterable of float or None

AUC-ROC scores from each cross-validation fold. Pass None to hide the AUC-ROC panel (e.g. for binary classifiers without probability support).

required
model_name str

Model name shown in the figure title (e.g. "SVC").

required
params str

Serialised hyperparameter string shown in the figure subtitle.

required
save_to Path or None

Destination path for the PNG file. Parent directories are created automatically if they do not exist. If None, the figure is displayed interactively via :func:matplotlib.pyplot.show. Default is None.

None
Notes

When save_to is provided, the figure is saved at 300 DPI with bbox_inches="tight" and the directory tree is created automatically. The function does not close the figure after saving; callers are responsible for calling :func:matplotlib.pyplot.close if needed.

Examples:

Save a plot for an SVC model to a nested directory:

>>> from pathlib import Path
>>> from melite import plot_cv_distributions
>>> f1  = [0.76, 0.90, 0.82]
>>> acc = [0.77, 0.90, 0.82]
>>> auc = [0.83, 0.95, 0.89]
>>> plot_cv_distributions(
...     f1, acc, auc,
...     model_name="SVC",
...     params="{'kernel': 'linear', 'C': 1}",
...     save_to=Path("output/figures/SVC_PCA70.png"),
... )

predict

Load a MELITE model artifact and run inference on new data.

Parameters:

Name Type Description Default
model_path str or Path

Path to a .pkl file produced by melite export.

required
X ndarray

Feature matrix of shape (n_samples, n_features). Must be a 2-D array and should use the same reduction method and level as the training data (e.g. PCA70 → 37 features).

required
return_proba bool

If True (default) and the loaded model exposes a predict_proba method, class probabilities are computed and included in the output. If False, or if the model does not support probability estimates, probabilities is None.

True

Returns:

Type Description
dict

Dictionary with the following keys:

  • "predictions" : :class:numpy.ndarray, shape (n_samples,) — predicted class labels.
  • "probabilities" : :class:numpy.ndarray or None, shape (n_samples, n_classes) — class probability estimates, or None if not available or not requested.
  • "model_path" : str — resolved path to the loaded model file.
  • "n_samples" : int — number of samples in X.

Raises:

Type Description
FileNotFoundError

If model_path does not exist. The error message includes the path and a hint to run melite export first.

ValueError

If X is not a 2-D numpy array.

Notes

The .pkl artifacts produced by melite export are serialised with :func:joblib.dump. All scikit-learn compatible estimators (SVC, RandomForestClassifier, XGBClassifier) are supported.

Examples:

Load a previously exported SVC model and predict on new data:

>>> import numpy as np
>>> from melite import predict
>>> X_new = np.random.rand(10, 37).astype(np.float32)
>>> result = predict("output/Model_SVC_PCA70.pkl", X_new)
>>> result["predictions"].shape
(10,)
>>> result["probabilities"].shape
(10, 2)

Version

Package version metadata for MELITE.

This module is the single source of truth for the project version. It is read by hatchling at build time via [tool.hatch.version] and imported by result_manager to stamp generated reports.