Skip to content

HDDFlyzer

Cheminformatics descriptor-space analysis

HDDFlyzer

Traceable, reproducible molecular descriptor-space workflows for CLI and Python.

CI PyPI Python versions License: LGPL v3

Pre-stable

HDDFlyzer is currently in Alpha-stage development. The public API is being hardened before stability is declared.

REG

Registry

Build a canonical molecule registry from local SDF, CSV, or SMILES collections.

DSC

Descriptors & similarity

Compute molecular descriptors and Tanimoto similarity matrices with full provenance.

VIZ

Projection & visualization

Reduce dimensionality with PCA, t-SNE, and UMAP, then generate publication-ready figures.

Why HDDFlyzer?

Exploratory cheminformatics often starts with a practical question: how are the molecules in a collection distributed across a descriptor space? In practice, answering that question usually requires several connected steps. The input molecules are prepared, descriptors or fingerprints are calculated, similarity relationships are computed, dimensionality-reduction methods are applied, and figures are generated for inspection.

When these steps are handled with separate scripts, notebooks, or output folders, the analysis can become difficult to revisit. A figure may exist without an obvious link to the molecule table behind it. A projection file may be separated from the descriptors or similarity matrix used to create it. After several runs, it may be unclear which outputs belong together, which parameters were used, or whether two visualizations were generated from comparable molecular representations.

HDDFlyzer (High-Dimensional Descriptor-based Feature Space Analyzer) addresses this fragmentation by treating the prepared molecular dataset as the anchor of the analysis. It provides a local, CLI-first workflow with a Python API for organizing descriptor-space exploration as a connected computational record. Each run links the molecule registry, descriptor tables, Tanimoto similarity outputs, PCA, t-SNE, and UMAP coordinates, figures, manifests, and execution metadata within a structured result folder.

This connected record allows completed analyses to be revisited, inspected, compared, and extended while preserving the relationship between molecules, computed representations, and downstream artifacts.

Molecular Features Represented

HDDFlyzer builds descriptor tables from molecular features calculated for each compound. These features describe complementary aspects of molecular structure and provide the numerical basis for similarity analysis, dimensionality reduction, and visualization.

Descriptor group Representative features used by HDDFlyzer
Size and composition MW, HeavyAtomCount, HeavyAtomMolWt, NumHeteroatoms, NumValenceElectrons
Polarity and molecular surface TPSA, LabuteASA, MolMR, PolarSurfaceArea_Fraction, PolarAtom_Fraction
Hydrogen bonding NumHDonors, NumHAcceptors, NHOHCount, NOCount, HDonor_Acceptor_Ratio
Lipophilicity and refractivity MolLogP, MolLogP_MW_Ratio, SlogP_VSA1, SlogP_VSA2
Rings, flexibility, and topology RingCount, NumRotatableBonds, FractionCSP3, BalabanJ, BertzCT, Kappa1, NumAromaticRings
Electronic and VSA descriptors MaxPartialCharge, MinPartialCharge, PEOE_VSA1, SMR_VSA1
Shape-related descriptors PMI1, PMI2, PMI3, NPR1, NPR2
Composite molecular scores QED, LeadLikeness_Score, Pharma_Complexity, Synthetic_Accessibility, Desirability_Profile

Fingerprint/Tanimoto outputs such as morgan_tanimoto, atompair_tanimoto, and maccs_tanimoto are structural similarity relationships. PCA, t-SNE, and UMAP coordinates are derived projections created from descriptor or similarity spaces, not original molecular features.

Descriptor-Space Provenance

In HDDFlyzer, a plot is treated as the visible outcome of a computational path. That path includes the molecular collection, the descriptor or similarity representation, the dimensionality-reduction method, and the parameters used during execution.

By keeping these elements together, HDDFlyzer makes it easier to return to a previous analysis, inspect the generated artifacts, and compare molecular representations using consistent molecule identities. The focus is practical provenance: knowing how a result was produced and how to find the files that support it.

Fingerprint-based descriptor and similarity results generated by HDDFlyzer
Fingerprint-derived molecular relationships.
PCA projection example generated from an HDDFlyzer descriptor-space workflow
Descriptor-space projection from a reconstructed run.

What You Provide and Receive

You provide HDDFlyzer returns
A local molecular collection (SDF, CSV, or SMILES file). A structured results/<tag>/ folder with all outputs.
A dataset tag and workflow parameters. Descriptor tables, similarity matrices, and projection coordinates.
Optional group definitions for comparison. Figures, metadata, a manifest, and a workflow summary.

Citation

Contreras-Torres, F. F. and Saldivar-González, F. I. (2026). HDDFlyzer: High-Dimensional Descriptor-based Feature Space Analyzer. https://github.com/NanoBiostructuresRG/hddflyzer

License

This project is licensed under the terms of the GNU Lesser General Public License v3.0 or later. SPDX identifier: LGPL-3.0-or-later.