Skip to content

MELITE

Tabular classification benchmarking

MELITE

Tabular classification benchmarking toolkit for model selection with repeated stratified cross-validation.

CI Version Python versions License: LGPL v3+

Pre-stable

MELITE is currently in alpha-stage development (v0.2.x). Publication on PyPI is prepared under the package name melite. Public APIs may change before 1.0.

Workflow

Input X / y prepared arrays
Benchmark melite run cross-validation
Results results.csv ranked rows
Export melite export final retraining
Artifact .pkl saved model
Inference predict() new matrices
01

Run

Evaluate configured feature matrices with SVC, Random Forest, and XGBoost model grids.

02

Select

Compare configurations with repeated stratified cross-validation and select by F1-macro.

03

Export

Retrain the selected model on all available data and save a reusable .pkl artifact.

04

Predict

Load the exported artifact and run inference on new matrices with the same feature representation.

Scope

MELITE is tabular at the modeling level. The learning algorithms only consume numeric X and y arrays, so the feature matrix may come from PCA, UMAP, fingerprints, descriptors, clinical variables, experimental measurements, industrial features, or manually selected numeric features.

MELITE does MELITE does not
Accept prepared X and y arrays. Generate PCA or UMAP representations.
Benchmark SVC, Random Forest, and XGBoost classifiers. Engineer molecular fingerprints or descriptors.
Select the best row by F1-macro. Handle raw molecular data directly.
Export a final retrained .pkl model. Require internet access at runtime.
Run artifact-based inference through predict(). Train deep learning models.
Handle any numeric tabular matrix. Generate descriptors or reductions from raw data.

MELITE uses a dataset registry under [datasets.<dataset_id>]. Each dataset_id names one concrete numeric X matrix candidate.

Registry pattern One dataset id, one numeric matrix.

Use metadata for reporting and traceability; execution follows the registered files, not hardcoded dataset families.

[datasets.morgan_r2_2048]
path = "data/morgan_r2_2048.npz"
label_path = "raw/labels.npy"
family = "fingerprints"
method = "Morgan"
variant = "radius2_2048"

morgan_r2_2048 is just a user-defined id. MELITE treats it as a concrete feature matrix candidate and reports the metadata with its results.

[datasets.rdkit_descriptors]
path = "data/rdkit_descriptors.npz"
label_path = "raw/labels.npy"
family = "descriptors"
method = "RDKit"
description = "Curated numeric descriptor table"

Descriptor tables follow the same strict contract: numeric, two-dimensional X, plus a label vector loaded from label_path.

[datasets.pca85]
path = "data/PCA85.npz"
label_path = "raw/labels.npy"
family = "dimensionality"
method = "PCA"
level = 85

[datasets.umap90]
path = "data/UMAP90.npz"
label_path = "raw/labels.npy"
family = "dimensionality"
method = "UMAP"
level = 90

PCA and UMAP are ordinary dataset entries. method and level preserve legacy reporting context without driving special execution logic.

Required fields are path and label_path; optional metadata fields are family, method, variant, level, and description. Legacy [benchmark].reduction_types and levels configs are still normalized into dataset entries when [datasets] is absent.

Each .npz dataset must contain an explicit X array; missing X fails strict dataset loading.

Quick Example

python -m pip install melite
melite run --smoke --config examples/example_config.toml
melite export --row 0 --csv examples/output/results.csv --outdir examples/output/
import numpy as np
from melite import predict

X_new = np.load("examples/sample_PCA70.npz")["X"]
result = predict("examples/output/Model_SVC_sample_pca70.pkl", X_new)
print(result["predictions"])

Documentation

Page Purpose
Installation Supported Python versions, local install, and optional dependencies.
Quick Start Minimal CLI and Python workflow using the bundled example data.
CLI Reference melite run, melite export, smoke mode, config files, and version checks.
Configuration Default TOML settings, user overrides, inputs, and outputs.
API Reference Public Python API generated from docstrings.
Release Notes Version history and validation notes.

Citation

If you use MELITE in your research, please cite it using the metadata in CITATION.cff.

Contreras-Torres, F. F., & Murrieta, A. C. (2026). MELITE: Multi-model Evaluation and Learning for Inference-ready Tabular Experiments. Zenodo. https://doi.org/10.5281/zenodo.20382752

License

This project is licensed under the terms of the GNU Lesser General Public License v3.0 or later. SPDX identifier: LGPL-3.0-or-later.