MELITE¶
Tabular classification benchmarking
Tabular classification benchmarking toolkit for model selection with repeated stratified cross-validation.
Pre-stable
MELITE is currently in alpha-stage development (v0.2.x). Publication on
PyPI is prepared under the package name melite. Public APIs may
change before 1.0.
Workflow¶
Run
Evaluate configured feature matrices with SVC, Random Forest, and XGBoost model grids.
Select
Compare configurations with repeated stratified cross-validation and select by F1-macro.
Export
Retrain the selected model on all available data and save a reusable
.pkl artifact.
Predict
Load the exported artifact and run inference on new matrices with the same feature representation.
Scope¶
MELITE is tabular at the modeling level. The learning algorithms only consume
numeric X and y arrays, so the feature matrix may come from PCA, UMAP,
fingerprints, descriptors, clinical variables, experimental measurements,
industrial features, or manually selected numeric features.
| MELITE does | MELITE does not |
|---|---|
Accept prepared X and y arrays. |
Generate PCA or UMAP representations. |
| Benchmark SVC, Random Forest, and XGBoost classifiers. | Engineer molecular fingerprints or descriptors. |
| Select the best row by F1-macro. | Handle raw molecular data directly. |
Export a final retrained .pkl model. |
Require internet access at runtime. |
Run artifact-based inference through predict(). |
Train deep learning models. |
| Handle any numeric tabular matrix. | Generate descriptors or reductions from raw data. |
MELITE uses a dataset registry under [datasets.<dataset_id>]. Each
dataset_id names one concrete numeric X matrix candidate.
Use metadata for reporting and traceability; execution follows the registered files, not hardcoded dataset families.
[datasets.morgan_r2_2048]
path = "data/morgan_r2_2048.npz"
label_path = "raw/labels.npy"
family = "fingerprints"
method = "Morgan"
variant = "radius2_2048"
morgan_r2_2048 is just a user-defined id. MELITE treats it as a concrete
feature matrix candidate and reports the metadata with its results.
[datasets.rdkit_descriptors]
path = "data/rdkit_descriptors.npz"
label_path = "raw/labels.npy"
family = "descriptors"
method = "RDKit"
description = "Curated numeric descriptor table"
Descriptor tables follow the same strict contract: numeric, two-dimensional
X, plus a label vector loaded from label_path.
[datasets.pca85]
path = "data/PCA85.npz"
label_path = "raw/labels.npy"
family = "dimensionality"
method = "PCA"
level = 85
[datasets.umap90]
path = "data/UMAP90.npz"
label_path = "raw/labels.npy"
family = "dimensionality"
method = "UMAP"
level = 90
PCA and UMAP are ordinary dataset entries. method and level preserve
legacy reporting context without driving special execution logic.
Required fields are path and label_path; optional metadata fields are
family, method, variant, level, and description. Legacy
[benchmark].reduction_types and levels configs are still normalized into
dataset entries when [datasets] is absent.
Each .npz dataset must contain an explicit X array; missing X fails
strict dataset loading.
Quick Example¶
python -m pip install melite
melite run --smoke --config examples/example_config.toml
melite export --row 0 --csv examples/output/results.csv --outdir examples/output/
import numpy as np
from melite import predict
X_new = np.load("examples/sample_PCA70.npz")["X"]
result = predict("examples/output/Model_SVC_sample_pca70.pkl", X_new)
print(result["predictions"])
Documentation¶
| Page | Purpose |
|---|---|
| Installation | Supported Python versions, local install, and optional dependencies. |
| Quick Start | Minimal CLI and Python workflow using the bundled example data. |
| CLI Reference | melite run, melite export, smoke mode, config files, and version checks. |
| Configuration | Default TOML settings, user overrides, inputs, and outputs. |
| API Reference | Public Python API generated from docstrings. |
| Release Notes | Version history and validation notes. |
Citation¶
If you use MELITE in your research, please cite it using the metadata in CITATION.cff.
Contreras-Torres, F. F., & Murrieta, A. C. (2026). MELITE: Multi-model Evaluation and Learning for Inference-ready Tabular Experiments. Zenodo. https://doi.org/10.5281/zenodo.20382752
License¶
This project is licensed under the terms of the
GNU Lesser General Public License v3.0 or later.
SPDX identifier: LGPL-3.0-or-later.