API Reference¶
MELITE exposes an intended public API through five symbols. The project is pre-stable, so this API may change before 1.0. Internal modules are importable directly but are not part of the public contract.
from melite import Config
from melite import load_datasets
from melite import plot_cv_distributions
from melite import predict
from melite import __version__
Config¶
Configuration container for MELITE.
Loads defaults from melite/config_default.toml. If user_config is
provided, its values are merged over the defaults — user values win and
missing keys fall back to defaults.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
smoke
|
bool
|
If |
False
|
user_config
|
Path or None
|
Path to a user-supplied TOML file. Only the keys present in this file
override the defaults. Default is |
None
|
Attributes:
| Name | Type | Description |
|---|---|---|
SMOKE |
bool
|
Whether the instance was created in smoke mode. |
PATHS |
dict
|
Dictionary with keys |
RESULTS_FILE |
str
|
Full path to the TXT results file ( |
RANDOM_STATE |
int
|
Global random seed. Default is |
REDUCTION_TYPES |
list of str
|
Reduction methods to benchmark (e.g. |
REDUCTION_LEVELS |
list of int
|
Variance retention levels to benchmark (e.g. |
DATASETS |
dict
|
Normalized dataset registry keyed by user-defined dataset id. Each
entry contains |
ACTIVE_MODELS |
list of str
|
Model keys to include in the benchmark (e.g. |
CV_CONFIG |
dict
|
Cross-validation settings with keys |
PARAM_GRID |
list of dict
|
Raw hyperparameter grid definitions, one entry per model configuration. |
PARAM_GRID_BY_MODEL |
dict
|
Compiled :class: |
Examples:
Default configuration:
>>> cfg = Config()
>>> cfg.RANDOM_STATE
42
Smoke mode with a user override:
>>> cfg = Config(smoke=True, user_config=Path("my_config.toml"))
>>> cfg.CV_CONFIG["n_splits"]
3
get_cv_config()
¶
Return the cross-validation configuration dictionary.
Returns:
| Type | Description |
|---|---|
dict
|
Dictionary with keys |
get_param_grid(model)
¶
Return the compiled hyperparameter grid for a given model.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model
|
str
|
Model key. One of |
required |
Returns:
| Type | Description |
|---|---|
ParameterGrid
|
Iterable of hyperparameter combinations for the requested model. |
setup()
¶
Create output directories and set random seeds.
This method must be called once from the pipeline entry point before
any data is loaded or models are trained. It is intentionally separated
from __init__ so that :class:Config can be instantiated in tests
without creating directories or modifying global random state.
Notes
Directories are created with exist_ok=True, so calling setup
multiple times is safe.
load_datasets¶
Load all datasets from config.DATASETS.
Returns:
| Type | Description |
|---|---|
dict
|
Mapping of dataset id to dictionaries with |
plot_cv_distributions¶
Generate and optionally save a three-panel CV metric distribution plot.
Creates a figure with one panel per metric (F1, Accuracy, AUC-ROC). Each
panel shows a box plot overlaid with jittered scatter points representing
individual cross-validation fold scores. If auc is None, the
AUC-ROC panel is hidden.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
f1
|
iterable of float
|
F1-macro scores from each cross-validation fold. |
required |
acc
|
iterable of float
|
Accuracy scores from each cross-validation fold. |
required |
auc
|
iterable of float or None
|
AUC-ROC scores from each cross-validation fold. Pass |
required |
model_name
|
str
|
Model name shown in the figure title (e.g. |
required |
params
|
str
|
Serialised hyperparameter string shown in the figure subtitle. |
required |
save_to
|
Path or None
|
Destination path for the PNG file. Parent directories are created
automatically if they do not exist. If |
None
|
Notes
When save_to is provided, the figure is saved at 300 DPI with
bbox_inches="tight" and the directory tree is created automatically.
The function does not close the figure after saving; callers are
responsible for calling :func:matplotlib.pyplot.close if needed.
Examples:
Save a plot for an SVC model to a nested directory:
>>> from pathlib import Path
>>> from melite import plot_cv_distributions
>>> f1 = [0.76, 0.90, 0.82]
>>> acc = [0.77, 0.90, 0.82]
>>> auc = [0.83, 0.95, 0.89]
>>> plot_cv_distributions(
... f1, acc, auc,
... model_name="SVC",
... params="{'kernel': 'linear', 'C': 1}",
... save_to=Path("output/figures/SVC_PCA70.png"),
... )
predict¶
Load a MELITE model artifact and run inference on new data.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model_path
|
str or Path
|
Path to a |
required |
X
|
ndarray
|
Feature matrix of shape |
required |
return_proba
|
bool
|
If |
True
|
Returns:
| Type | Description |
|---|---|
dict
|
Dictionary with the following keys:
|
Raises:
| Type | Description |
|---|---|
FileNotFoundError
|
If model_path does not exist. The error message includes the path
and a hint to run |
ValueError
|
If X is not a 2-D numpy array. |
Notes
The .pkl artifacts produced by melite export are serialised with
:func:joblib.dump. All scikit-learn compatible estimators (SVC,
RandomForestClassifier, XGBClassifier) are supported.
Examples:
Load a previously exported SVC model and predict on new data:
>>> import numpy as np
>>> from melite import predict
>>> X_new = np.random.rand(10, 37).astype(np.float32)
>>> result = predict("output/Model_SVC_PCA70.pkl", X_new)
>>> result["predictions"].shape
(10,)
>>> result["probabilities"].shape
(10, 2)
Version¶
Package version metadata for MELITE.
This module is the single source of truth for the project version.
It is read by hatchling at build time via [tool.hatch.version]
and imported by result_manager to stamp generated reports.