MELITE¶

Tabular classification benchmarking

MELITE

Tabular classification benchmarking toolkit for model selection with repeated stratified cross-validation.

Install Quick start API Reference

Pre-stable

MELITE is currently in alpha-stage development (v0.2.x). Publication on PyPI is prepared under the package name melite. Public APIs may change before 1.0.

Workflow¶

Input X / y prepared arrays

Benchmark melite run cross-validation

Results results.csv ranked rows

Export melite export final retraining

Artifact .pkl saved model

Inference predict() new matrices

01

Run

Evaluate configured feature matrices with SVC, Random Forest, and XGBoost model grids.

02

Select

Compare configurations with repeated stratified cross-validation and select by F1-macro.

03

Export

Retrain the selected model on all available data and save a reusable .pkl artifact.

04

Predict

Load the exported artifact and run inference on new matrices with the same feature representation.

Scope¶

MELITE is tabular at the modeling level. The learning algorithms only consume numeric X and y arrays, so the feature matrix may come from PCA, UMAP, fingerprints, descriptors, clinical variables, experimental measurements, industrial features, or manually selected numeric features.

MELITE does	MELITE does not
Accept prepared `X` and `y` arrays.	Generate PCA or UMAP representations.
Benchmark SVC, Random Forest, XGBoost, and opt-in experimental stacking classifiers.	Engineer molecular fingerprints or descriptors.
Select the best row by F1-macro.	Handle raw molecular data directly.
Export a final retrained `.pkl` model.	Require internet access at runtime.
Run artifact-based inference through `predict()`.	Train deep learning models.
Handle any numeric tabular matrix.	Generate descriptors or reductions from raw data.

MELITE uses a dataset registry under [datasets.<dataset_id>]. Each dataset_id names one concrete numeric X matrix candidate.

Registry pattern One dataset id, one numeric matrix.

Use metadata for reporting and traceability; execution follows the registered files, not hardcoded dataset families.

FingerprintsDescriptorsDimensionality

[datasets.morgan_r2_2048]
path = "data/morgan_r2_2048.npz"
label_path = "raw/labels.npy"
family = "fingerprints"
method = "Morgan"
variant = "radius2_2048"

morgan_r2_2048 is just a user-defined id. MELITE treats it as a concrete feature matrix candidate and reports the metadata with its results.

[datasets.rdkit_descriptors]
path = "data/rdkit_descriptors.npz"
label_path = "raw/labels.npy"
family = "descriptors"
method = "RDKit"
description = "Curated numeric descriptor table"

Descriptor tables follow the same strict contract: numeric, two-dimensional X, plus a label vector loaded from label_path.

[datasets.pca85]
path = "data/PCA85.npz"
label_path = "raw/labels.npy"
family = "dimensionality"
method = "PCA"
level = 85

[datasets.umap90]
path = "data/UMAP90.npz"
label_path = "raw/labels.npy"
family = "dimensionality"
method = "UMAP"
level = 90

PCA and UMAP are ordinary dataset entries. method and level preserve legacy reporting context without driving special execution logic.

Required fields are path and label_path; optional metadata fields are family, method, variant, level, and description. Legacy [benchmark].reduction_types and levels configs are still normalized into dataset entries when [datasets] is absent.

Each .npz dataset must contain an explicit X array; missing X fails strict dataset loading.

Quick Example¶

python -m pip install melite
melite run --smoke --config examples/example_config.toml
melite export --row 0 --csv examples/output/results.csv --outdir examples/output/

import numpy as np
from melite import predict

X_new = np.load("examples/sample_PCA70.npz")["X"]
result = predict("examples/output/Model_SVC_sample_pca70.pkl", X_new)
print(result["predictions"])

Documentation¶

Page	Purpose
Installation	Supported Python versions, local install, and optional dependencies.
Quick Start	Minimal CLI and Python workflow using the bundled example data.
CLI Reference	`melite run`, `melite export`, smoke mode, config files, and version checks.
Configuration	Default TOML settings, user overrides, inputs, and outputs.
API Reference	Public Python API generated from docstrings.
Release Notes	Version history and validation notes.

Citation¶

If you use MELITE in your research, please cite it using the metadata in CITATION.cff.

Contreras-Torres, F. F., & Murrieta, A. C. (2026). MELITE: Multi-model Evaluation and Learning for Inference-ready Tabular Experiments. Zenodo. https://doi.org/10.5281/zenodo.20382752

License¶

This project is licensed under the terms of the GNU Lesser General Public License v3.0 or later. SPDX identifier: LGPL-3.0-or-later.