Usage¶
This page shows how to run HDDFlyzer from the command line and how to interpret the result folder it creates.
Installation¶
After v0.1.5 is merged, tagged, and published through the manual PyPI
workflow:
pip install hddflyzer
Python 3.11 or newer is required. RDKit is best installed from conda-forge before installing HDDFlyzer.
conda create -n hddflyzer_env python=3.11
conda activate hddflyzer_env
conda install -c conda-forge rdkit
pip install hddflyzer
UMAP support uses umap-learn when available and is listed as an optional
dependency.
Quick Start¶
Prepare a molecule registry and run the standard workflow for the aocd
collection:
hddflyzer data prepare aocd
hddflyzer pipeline run aocd
The aocd value is the dataset tag. HDDFlyzer uses that tag to locate input
data and to write outputs under results/aocd/.
Input Data¶
HDDFlyzer starts from a local molecular collection stored as a CSV file. By default, input collections are placed in:
examples/
For a tag named aocd, a typical input file is:
examples/valid_metadata_aocd.csv
When you run:
hddflyzer data prepare aocd
HDDFlyzer searches the input directory for a CSV file whose filename contains
the tag aocd. You can also pass an explicit CSV path:
hddflyzer data prepare aocd path/to/input.csv
The input table must contain a SMILES column. Columns whose names contain
smiles or canonical_smiles are detected automatically. A compound identifier
column is optional; HDDFlyzer detects identifier, id, compound_id, or
molecule_id when present, and otherwise creates identifiers automatically.
To use a different input directory, set HDDFLYZER_DATA_DIR:
$env:HDDFLYZER_DATA_DIR = "C:\path\to\csvs"
hddflyzer data prepare aocd
HDDFLYZER_DATA_DIR=/path/to/csvs hddflyzer data prepare aocd
Running the Workflow¶
The canonical workflow follows this shape:
compound collection
-> registry
-> descriptors and similarity
-> dimensionality reduction
-> visualization
-> manifest/results
Run the full workflow with:
hddflyzer pipeline run aocd
You can also run selected stages:
hddflyzer pipeline run aocd --skip-dimred
hddflyzer pipeline run aocd --stages chem.features,chem.pruning
Understanding the Result Folder¶
All workflow outputs are written under:
results/<tag>/
For aocd, this becomes:
results/aocd/
The preparation step creates the canonical molecule registry:
results/aocd/registry/molecules.csv
This registry records stable identifiers, raw and canonical SMILES, validity flags, source provenance, and row-level input metadata. Downstream descriptor, similarity, dimensionality-reduction, and visualization steps use this registry as the shared molecule base.
Important result files include:
manifest.jsonworkflow_summary.md- registry, chemistry, feature, dimensionality-reduction, and figure outputs
- operation metadata
Representative outputs include:
- canonical molecule registry;
- descriptor tables;
- Tanimoto similarity matrix;
- PCA, t-SNE, and UMAP projection coordinates;
- figures;
- result manifest and workflow summary.
Workflow Modules¶
Data
hddflyzer data prepare builds the canonical molecule registry from a local collection.
Chemistry
hddflyzer chem computes descriptors, Tanimoto similarity, and feature pruning.
Dim. reduction & viz
hddflyzer dimred and hddflyzer viz run PCA, t-SNE, UMAP, and generate figures.
| Module | Subcommand | Description | Output category |
|---|---|---|---|
data |
prepare |
Build canonical molecule registry | registry |
chem |
features |
Compute molecular descriptors | chem |
chem |
tanimoto |
Compute Tanimoto similarity matrix | chem |
chem |
pruning |
Prune low-variance and correlated features | chem |
dimred |
pca |
PCA projection | dimred |
dimred |
tsne |
t-SNE projection | dimred |
dimred |
umap |
UMAP projection | dimred |
viz |
pca analysis |
Generate PCA figures | figures |
Common CLI Commands¶
# Data preparation
hddflyzer data prepare aocd
hddflyzer data prepare aocd path/to/input.csv
# Pipeline control
hddflyzer pipeline run aocd
hddflyzer pipeline run aocd --skip-dimred
hddflyzer pipeline run aocd --stages chem.features,chem.pruning
# Module-level commands
hddflyzer chem tanimoto aocd
hddflyzer chem features aocd
hddflyzer chem pruning aocd
hddflyzer dimred pca aocd
hddflyzer viz pca analysis aocd
Current Scope and Boundaries¶
HDDFlyzer is not currently:
- a docking workflow;
- a web dashboard;
- a cloud or server workflow;
- an automatic clustering system;
- an enrichment workflow;
- an automatic chemical interpretation engine;
- a published PyPI package or public release until the
v0.1.5tag and manual publishing workflow have completed successfully.
Safety Notes¶
Security defaults
- Pickle loading is blocked by default in all public loaders. Use
allow_pickle=Trueonly with trusted local files. - Run tags reject empty values, path traversal, absolute paths, and path separators.
- Reconstructed artifacts must remain inside
results/<tag>/. update_manifest()rejects files outside the run directory.