API Overview and Reference¶
This page shows how to use HDDFlyzer programmatically from Python, then lists the current public API surface generated from NumPy-style docstrings.
HDDFlyzer is local pre-release research software. The objects below represent the supported Python surface at this stage; internal helpers are intentionally not listed here.
API Layers¶
| Layer | Purpose |
|---|---|
| Pipeline | Execute the workflow and inspect execution summaries. |
| Results | Reconstruct completed runs and select generated artifacts. |
| Science | Wrap loaded artifacts as descriptor, similarity, or projection spaces. |
| Visualization | Resolve visualization inputs and plot from reconstructed results. |
Execute a Workflow from Python¶
The workflow engine can be called from Python at three levels of control:
from hddflyzer.pipeline import execute_workflow, run_workflow, run_pipeline
# High-level: returns WorkflowExecution
execution = execute_workflow("aocd")
# Mid-level: returns reconstructed WorkflowRun
run = run_workflow("aocd")
# Low-level: returns list[StageResult]
results = run_pipeline("aocd")
Reconstruct Completed Runs¶
Completed runs can be reconstructed from results/<tag>/manifest.json:
from hddflyzer.results import load_workflow_run
run = load_workflow_run("aocd")
run.workflow_contract
run.outputs(category="chem")
run.outputs(category="dimred")
The reconstructed run exposes the workflow contract, current outputs, output categories, artifact metadata, and semantic artifact selectors.
Select and Load Artifacts¶
Artifacts can be selected by semantic kind:
artifact = run.artifact(
kind="descriptor_table",
category="chem",
operation="chem.features",
)
loaded = run.load_artifact(
kind="descriptor_table",
category="chem",
operation="chem.features",
)
loaded.data
loaded.metadata
Use required="path/fragment.csv" when a kind/category query needs
disambiguation.
Supported artifact kinds include:
| Kind | Description |
|---|---|
molecule_registry |
Canonical molecule registry |
descriptor_table |
Molecular descriptor matrix |
tanimoto_matrix |
Pairwise Tanimoto similarity |
projection_coordinates |
PCA / t-SNE / UMAP coordinates |
figure |
Generated plot files |
metadata |
Operation metadata |
workflow_summary |
Human-readable run summary |
unknown |
Unclassified artifact |
Scientific Views¶
Loaded artifacts can be wrapped as lightweight scientific spaces:
DescriptorSpaceSimilaritySpaceProjectionSpace
WorkflowRun can load scientific views over existing artifacts:
descriptors = run.descriptor_space(
category="chem",
operation="chem.features",
)
similarity = run.similarity_space(category="chem")
projection = run.projection_space(
category="dimred",
operation="dimred.pca",
)
These views expose molecule identifiers when available:
descriptors.molecule_ids
similarity.molecule_ids
projection.molecule_ids
These views operate on existing artifacts. They do not recalculate descriptors, similarity matrices, or projections, and they do not create plots.
The science layer also provides molecule identity helpers, cross-space alignment, structural metrics, descriptor-projection correlation, neighborhood preservation, and group comparison for explicitly defined groups.
Alignment and Science Helpers¶
from hddflyzer.science import (
align_spaces,
compare_descriptor_groups,
descriptor_projection_correlations,
projection_neighborhood_preservation,
similarity_projection_correlation,
similarity_projection_neighbor_overlap,
)
descriptors_aligned, projection_aligned = align_spaces(descriptors, projection)
global_corr = similarity_projection_correlation(similarity, projection)
neighbor_overlap = similarity_projection_neighbor_overlap(similarity, projection, k=10)
desc_ranking = descriptor_projection_correlations(descriptors, projection)
local_preserv = projection_neighborhood_preservation(similarity, projection, k=10)
group_diff = compare_descriptor_groups(descriptors, labels="class_label")
Science helpers operate on existing artifacts
These helpers operate on existing artifacts. They do not recalculate descriptors, similarity, or projections; do not generate plots; and do not perform automatic clustering, enrichment, or chemical interpretation.
Visualization from Reconstructed Results¶
from hddflyzer.viz import resolve_viz_inputs, plot_hddf_scatters
inputs = resolve_viz_inputs(
run,
kind="descriptor_table",
category="chem",
required="features/full/features.csv",
)
plot_hddf_scatters(inputs)
plot_hddf_scatters() also accepts a loaded descriptor-table artifact directly.
Reference¶
Pipeline¶
PipelineContext
dataclass
¶
Runtime options shared by pipeline stages.
Attributes:
| Name | Type | Description |
|---|---|---|
tag |
str
|
Dataset tag for the run. Pipeline stages use this tag to resolve
inputs and outputs under |
save_pickle |
bool, default=False
|
Whether stages that support optional pickle output should write it. |
continue_on_error |
bool, default=False
|
Whether the pipeline runner should continue after a failed stage. |
options |
dict
|
Additional stage options. The core pipeline keeps this mapping generic so stages can receive small, stage-specific values without changing the shared context contract. |
StageResult
dataclass
¶
Result returned by a pipeline stage.
Attributes:
| Name | Type | Description |
|---|---|---|
name |
str
|
Stage name, for example |
ok |
bool
|
|
message |
str, default=""
|
Optional human-readable status or error message. |
Stage
¶
Bases: Protocol
Executable pipeline stage interface.
Stages are small objects with a name and a run method. They are
consumed by run_pipeline and return StageResult instances.
Attributes:
| Name | Type | Description |
|---|---|---|
name |
str
|
Unique stage name used for selection and reporting. |
run(context)
¶
Execute the stage with the given pipeline context.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
context
|
PipelineContext
|
Shared runtime options for the current workflow execution. |
required |
Returns:
| Type | Description |
|---|---|
StageResult
|
Result describing whether the stage succeeded. |
WorkflowExecution
dataclass
¶
Complete programmatic result of a workflow execution.
Attributes:
| Name | Type | Description |
|---|---|---|
tag |
str
|
Sanitized dataset tag that was executed. |
stage_results |
list of StageResult
|
Per-stage execution results returned by |
run |
WorkflowRun or None
|
Reconstructed workflow run when |
run_pipeline(tag, stage_names=None, include_sample=True, include_dimred=True, save_pickle=False, continue_on_error=False)
¶
Run an ordered HDDFlyzer pipeline for a dataset tag.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
tag
|
str
|
Dataset tag to process. |
required |
stage_names
|
iterable of str
|
Stage names to run. When omitted, the default stage sequence is used. |
None
|
include_sample
|
bool
|
Whether to include the Tanimoto sampling stage in the default stage sequence. |
True
|
include_dimred
|
bool
|
Whether to include dimensionality-reduction stages in the default stage sequence. |
True
|
save_pickle
|
bool
|
Whether stages that support optional pickle output should write it. |
False
|
continue_on_error
|
bool
|
Whether to continue executing stages after a failed stage. |
False
|
Returns:
| Type | Description |
|---|---|
list of StageResult
|
Per-stage results in execution order. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If |
Notes
This function performs the file-based workflow and writes the normal
HDDFlyzer outputs under results/<tag>/. It returns stage status only;
use execute_workflow when both status and reconstructed results are
needed.
run_workflow(tag, stage_names=None, include_sample=True, include_dimred=True, save_pickle=False, continue_on_error=False)
¶
Run a pipeline and return the reconstructed workflow run.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
tag
|
str
|
Dataset tag to process. |
required |
stage_names
|
iterable of str
|
Stage names to run. When omitted, the default stage sequence is used. |
None
|
include_sample
|
bool
|
Whether to include the Tanimoto sampling stage. |
True
|
include_dimred
|
bool
|
Whether to include dimensionality-reduction stages. |
True
|
save_pickle
|
bool
|
Whether stages that support optional pickle output should write it. |
False
|
continue_on_error
|
bool
|
Whether to continue after failed stages. |
False
|
Returns:
| Type | Description |
|---|---|
WorkflowRun
|
Reconstructed run loaded from |
Raises:
| Type | Description |
|---|---|
RuntimeError
|
If one or more stages fail and |
FileNotFoundError
|
If the run manifest cannot be found after execution. |
ValueError
|
If the run manifest is invalid or cannot be reconstructed. |
execute_workflow(tag, stage_names=None, include_sample=True, include_dimred=True, save_pickle=False, continue_on_error=False)
¶
Run a pipeline and return execution status plus reconstructed results.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
tag
|
str
|
Dataset tag to process. |
required |
stage_names
|
iterable of str
|
Stage names to run. When omitted, the default stage sequence is used. |
None
|
include_sample
|
bool
|
Whether to include the Tanimoto sampling stage. |
True
|
include_dimred
|
bool
|
Whether to include dimensionality-reduction stages. |
True
|
save_pickle
|
bool
|
Whether stages that support optional pickle output should write it. |
False
|
continue_on_error
|
bool
|
Whether to continue after failed stages. |
False
|
Returns:
| Type | Description |
|---|---|
WorkflowExecution
|
Execution object containing stage results, global success status, and
the reconstructed |
Notes
execute_workflow is the common programmatic contract used by the CLI.
It does not hide stage failures; inspect execution.stage_results and
execution.ok for status.
Results¶
ResultArtifact
dataclass
¶
A result file with workflow and scientific semantics.
Attributes:
| Name | Type | Description |
|---|---|---|
path |
Path
|
Absolute or resolved filesystem path for the artifact. |
relative_path |
str
|
Manifest-relative path inside |
category |
str
|
Workflow area, such as |
kind |
str
|
Semantic artifact kind, for example |
operation |
str or None, default=None
|
Manifest operation that produced the artifact, when known. |
metadata |
dict or None, default=None
|
Operation metadata associated with the artifact, when available. |
LoadedArtifact
dataclass
¶
Loaded data plus its semantic result artifact.
Attributes:
| Name | Type | Description |
|---|---|---|
artifact |
ResultArtifact
|
Artifact that was loaded. |
data |
Any
|
Loaded Python object. The type depends on |
metadata |
dict
|
Loader metadata and artifact metadata useful for traceability. |
classify_artifact(relative_path, operation=None)
¶
Classify a manifest output path into a semantic artifact kind.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
relative_path
|
str
|
Manifest-relative output path. |
required |
operation
|
str
|
Operation name that produced the output, when known. |
None
|
Returns:
| Type | Description |
|---|---|
str
|
Semantic artifact kind. Unknown paths return |
load_artifact(artifact, allow_pickle=False)
¶
Load a supported result artifact into Python data.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
artifact
|
ResultArtifact
|
Semantic artifact to load. |
required |
allow_pickle
|
bool
|
Whether pickle-backed table artifacts may be loaded. Pickle is disabled by default and should be enabled only for trusted local files. |
False
|
Returns:
| Type | Description |
|---|---|
LoadedArtifact
|
Loaded data, metadata, and the source artifact. |
Raises:
| Type | Description |
|---|---|
FileNotFoundError
|
If the artifact file or required companion file is missing. |
ValueError
|
If the artifact kind is unsupported, the file format is invalid, pickle loading is not allowed, or the loaded data violates its minimum contract. |
Notes
Supported loaded kinds include descriptor tables, projection coordinates,
molecule registries, metadata JSON, workflow summaries, and Tanimoto
matrices. Tanimoto matrices are loaded with numpy.load(...,
allow_pickle=False).
WorkflowRun
dataclass
¶
Queryable view of a completed HDDFlyzer run.
WorkflowRun reconstructs a completed run from an existing
results/<tag>/manifest.json file. It does not execute pipeline stages,
recalculate outputs, or create new state.
Attributes:
| Name | Type | Description |
|---|---|---|
tag |
str
|
Dataset tag recorded in the manifest. |
manifest_path |
Path
|
Path to the reconstructed |
manifest |
dict
|
Parsed manifest content. |
current_outputs
property
¶
list of str: Currently registered manifest output paths.
operations
property
¶
list of dict: Operation records from the manifest.
output_categories
property
¶
dict: Output paths grouped by workflow area.
workflow_contract
property
¶
dict: Workflow contract recorded in the manifest.
artifact(kind=None, category=None, operation=None, required=None)
¶
Return exactly one artifact matching the requested filters.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
kind
|
str | None
|
Filters passed to |
None
|
category
|
str | None
|
Filters passed to |
None
|
operation
|
str | None
|
Filters passed to |
None
|
required
|
str | None
|
Filters passed to |
None
|
Returns:
| Type | Description |
|---|---|
ResultArtifact
|
The single matching artifact. |
Raises:
| Type | Description |
|---|---|
FileNotFoundError
|
If no artifact matches the filters. |
ValueError
|
If multiple artifacts match the filters. |
artifacts(kind=None, category=None, operation=None, required=None)
¶
Return semantic result artifacts derived from manifest outputs.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
kind
|
str
|
Semantic artifact kind to select. |
None
|
category
|
str
|
Output category to select. |
None
|
operation
|
str
|
Producing operation to select. |
None
|
required
|
str or iterable of str
|
Required path fragment or fragments used to disambiguate outputs. |
None
|
Returns:
| Type | Description |
|---|---|
list of ResultArtifact
|
Matching artifacts with resolved paths, categories, kinds, and operation metadata. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If a manifest output path is absolute, contains traversal, or would resolve outside the run directory. |
descriptor_space(category=None, operation=None, required=None, allow_pickle=False)
¶
Load a descriptor-table artifact as a scientific descriptor space.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
category
|
str | None
|
Artifact filters used to select one descriptor table. |
None
|
operation
|
str | None
|
Artifact filters used to select one descriptor table. |
None
|
required
|
str | None
|
Artifact filters used to select one descriptor table. |
None
|
allow_pickle
|
bool
|
Whether pickle-backed descriptor tables may be loaded. |
False
|
Returns:
| Type | Description |
|---|---|
DescriptorSpace
|
Descriptor-space view over an existing loaded artifact. |
Notes
This method does not recalculate descriptors.
load_artifact(kind=None, category=None, operation=None, required=None, allow_pickle=False)
¶
Select and load exactly one artifact.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
kind
|
str | None
|
Filters passed to |
None
|
category
|
str | None
|
Filters passed to |
None
|
operation
|
str | None
|
Filters passed to |
None
|
required
|
str | None
|
Filters passed to |
None
|
allow_pickle
|
bool
|
Whether pickle-backed table artifacts may be loaded. Enable only for trusted local files. |
False
|
Returns:
| Type | Description |
|---|---|
LoadedArtifact
|
Loaded data, metadata, and source artifact. |
Raises:
| Type | Description |
|---|---|
FileNotFoundError
|
If no artifact matches or a required file is missing. |
ValueError
|
If multiple artifacts match, loading is unsupported, pickle loading is blocked, or loaded data is invalid. |
operation_metadata(operation_name)
¶
Return metadata for the latest recorded operation.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
operation_name
|
str
|
Operation name, for example |
required |
Returns:
| Type | Description |
|---|---|
dict or None
|
Operation metadata when present. |
operations_by_stage(stage)
¶
Return operation records associated with a workflow stage.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
stage
|
str
|
Workflow stage or area, such as |
required |
Returns:
| Type | Description |
|---|---|
list of dict
|
Matching operation records in manifest order. |
outputs(category=None)
¶
Return output paths from the reconstructed manifest.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
category
|
str
|
Workflow category to select. When omitted, all current outputs are returned. |
None
|
Returns:
| Type | Description |
|---|---|
list of str
|
Manifest-relative output paths. |
projection_space(category=None, operation=None, required=None, allow_pickle=False)
¶
Load projection coordinates as a scientific projection space.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
category
|
str | None
|
Artifact filters used to select one projection-coordinate table. |
None
|
operation
|
str | None
|
Artifact filters used to select one projection-coordinate table. |
None
|
required
|
str | None
|
Artifact filters used to select one projection-coordinate table. |
None
|
allow_pickle
|
bool
|
Whether pickle-backed projection tables may be loaded. |
False
|
Returns:
| Type | Description |
|---|---|
ProjectionSpace
|
Projection-space view over existing dimensionality-reduction coordinates. |
Notes
This method does not recalculate PCA, t-SNE, UMAP, or other projections.
similarity_space(category=None, operation=None, required=None, allow_pickle=False)
¶
Load a Tanimoto matrix artifact as a scientific similarity space.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
category
|
str | None
|
Artifact filters used to select one Tanimoto matrix. |
None
|
operation
|
str | None
|
Artifact filters used to select one Tanimoto matrix. |
None
|
required
|
str | None
|
Artifact filters used to select one Tanimoto matrix. |
None
|
allow_pickle
|
bool
|
Present for API consistency. Tanimoto matrices are loaded from
|
False
|
Returns:
| Type | Description |
|---|---|
SimilaritySpace
|
Similarity-space view over an existing Tanimoto matrix. |
Notes
This method does not recalculate fingerprints or similarity.
summary()
¶
Return a compact programmatic summary of the reconstructed run.
Returns:
| Type | Description |
|---|---|
dict
|
Summary containing tag, manifest path, operation count, current output count, output categories, and workflow contract. |
to_dict()
¶
Return a dictionary representation of this reconstructed run.
Returns:
| Type | Description |
|---|---|
dict
|
Dictionary containing tag, manifest path, and manifest content. |
load_workflow_run(tag, results_dir=None)
¶
Load a completed run from results/<tag>/manifest.json.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
tag
|
str
|
Run tag to reconstruct. Tags are validated and must not contain path separators, traversal, or absolute paths. |
required |
results_dir
|
Path or str
|
Root results directory. Defaults to |
None
|
Returns:
| Type | Description |
|---|---|
WorkflowRun
|
Reconstructed run backed by the parsed manifest. |
Raises:
| Type | Description |
|---|---|
FileNotFoundError
|
If |
ValueError
|
If the tag is invalid, the resolved manifest escapes |
Science¶
DescriptorSpace
dataclass
¶
Descriptor table interpreted as a molecular descriptor space.
Attributes:
| Name | Type | Description |
|---|---|---|
artifact |
ResultArtifact
|
Source descriptor-table artifact. |
data |
DataFrame
|
Loaded descriptor table. |
metadata |
dict
|
Loader and operation metadata. |
n_molecules |
int
|
Number of rows in |
feature_names |
tuple of str
|
Descriptor feature columns, excluding common identity columns. |
molecule_ids
property
¶
tuple of str: Molecular identifiers when present, else empty.
SimilaritySpace
dataclass
¶
Pairwise molecular similarity matrix with aligned identifiers.
Attributes:
| Name | Type | Description |
|---|---|---|
artifact |
ResultArtifact
|
Source Tanimoto matrix artifact. |
matrix |
ndarray
|
Pairwise similarity matrix. |
ids |
tuple of str
|
Identifiers aligned to matrix rows and columns. |
metadata |
dict
|
Loader and operation metadata. |
n_molecules |
int
|
Number of molecules in the similarity matrix. |
molecule_ids
property
¶
tuple of str: Molecular identifiers aligned to the matrix.
ProjectionSpace
dataclass
¶
Dimensionality-reduction coordinates for a molecular collection.
Attributes:
| Name | Type | Description |
|---|---|---|
artifact |
ResultArtifact
|
Source projection-coordinate artifact. |
data |
DataFrame
|
Loaded coordinate table. |
metadata |
dict
|
Loader and operation metadata. |
coordinate_columns |
tuple of str
|
Numeric coordinate columns used as the projection axes. |
n_molecules |
int
|
Number of rows in |
molecule_ids
property
¶
tuple of str: Molecular identifiers when present, else empty.
to_descriptor_space(loaded)
¶
Convert a loaded descriptor table artifact into a descriptor space.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
loaded
|
LoadedArtifact
|
Loaded artifact with kind |
required |
Returns:
| Type | Description |
|---|---|
DescriptorSpace
|
Scientific view over the loaded descriptor table. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If the artifact kind or data type is incompatible. |
Notes
This converter wraps existing loaded data and does not recalculate descriptors.
to_similarity_space(loaded)
¶
Convert a loaded Tanimoto matrix artifact into a similarity space.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
loaded
|
LoadedArtifact
|
Loaded artifact with kind |
required |
Returns:
| Type | Description |
|---|---|
SimilaritySpace
|
Scientific view over the loaded similarity matrix. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If the artifact kind, data type, or ID alignment is incompatible. |
Notes
This converter wraps an existing matrix and does not recalculate fingerprints or similarity.
to_projection_space(loaded)
¶
Convert loaded projection coordinates into a projection space.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
loaded
|
LoadedArtifact
|
Loaded artifact with kind |
required |
Returns:
| Type | Description |
|---|---|
ProjectionSpace
|
Scientific view over existing projection coordinates. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If the artifact kind, data type, or coordinate columns are incompatible. |
Notes
This converter does not recalculate PCA, t-SNE, UMAP, or other projections.
shared_molecule_ids(space_a, space_b)
¶
Return shared molecule IDs, preserving first-space order.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
space_a
|
object
|
Objects exposing a |
required |
space_b
|
object
|
Objects exposing a |
required |
Returns:
| Type | Description |
|---|---|
tuple of str
|
IDs present in both spaces, ordered as in |
has_aligned_molecule_ids(space_a, space_b)
¶
Return whether two spaces have the same non-empty molecule ID order.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
space_a
|
object
|
Objects exposing a |
required |
space_b
|
object
|
Objects exposing a |
required |
Returns:
| Type | Description |
|---|---|
bool
|
|
align_spaces(*spaces)
¶
Return spaces filtered and reordered to shared molecule IDs.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
*spaces
|
DescriptorSpace, ProjectionSpace, or SimilaritySpace
|
Two or more scientific spaces to align. |
()
|
Returns:
| Type | Description |
|---|---|
tuple
|
New space instances of the same types, filtered to shared IDs and ordered according to the first space. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If fewer than two spaces are provided, any space type is unsupported,
any space has empty |
Notes
Alignment uses existing data only. It does not recalculate descriptors, similarity matrices, or projections, and it does not mutate the input spaces.
SpaceMetricResult
dataclass
¶
Scalar result of a structural comparison between scientific spaces.
Attributes:
| Name | Type | Description |
|---|---|---|
name |
str
|
Metric name. |
value |
float
|
Scalar metric value. Some metrics may return |
metadata |
dict
|
Metric metadata such as molecule counts, pair counts, or coordinate columns. |
DescriptorProjectionCorrelationResult
dataclass
¶
Ranked descriptor/projection coordinate correlations.
Attributes:
| Name | Type | Description |
|---|---|---|
data |
DataFrame
|
Ranked table with descriptor-coordinate correlation rows. |
metadata |
dict
|
Result metadata including molecule count, feature names, and coordinate columns. |
top_features(n=10)
¶
Return the top ranked descriptor-coordinate rows.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
n
|
int
|
Number of rows to return. |
10
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
Copy of the top |
Raises:
| Type | Description |
|---|---|
ValueError
|
If |
NeighborhoodPreservationResult
dataclass
¶
Per-molecule neighborhood preservation diagnostics.
Attributes:
| Name | Type | Description |
|---|---|---|
data |
DataFrame
|
Per-molecule table with neighbor overlap counts, fractions, and neighbor ID lists. |
metadata |
dict
|
Result metadata including molecule count, |
worst_preserved(n=10)
¶
Return molecules with the lowest neighborhood overlap.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
n
|
int
|
Number of rows to return. |
10
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
Copy of the lowest-overlap rows, sorted by overlap fraction and molecule ID. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If |
DescriptorGroupComparisonResult
dataclass
¶
Ranked descriptor differences for explicit groups.
Attributes:
| Name | Type | Description |
|---|---|---|
data |
DataFrame
|
Group-feature summary table ranked by absolute deviation from the global descriptor mean. |
metadata |
dict
|
Result metadata including molecule count, retained groups, feature names, and minimum group size. |
top_differences(n=10)
¶
Return the top group-feature differences.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
n
|
int
|
Number of rows to return. |
10
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
Copy of the top |
Raises:
| Type | Description |
|---|---|
ValueError
|
If |
similarity_projection_correlation(similarity, projection)
¶
Correlate pairwise similarity with projected-space proximity.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
similarity
|
SimilaritySpace
|
Similarity space containing an existing pairwise similarity matrix. |
required |
projection
|
ProjectionSpace
|
Projection space containing existing coordinates. |
required |
Returns:
| Type | Description |
|---|---|
SpaceMetricResult
|
Scalar correlation result and metadata. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If inputs have wrong types, molecule IDs cannot be aligned, fewer than two aligned molecules are available, or projection coordinates are insufficient. |
Notes
The function aligns inputs by molecule ID and uses existing artifacts only. It does not recalculate similarity or projections.
similarity_projection_neighbor_overlap(similarity, projection, k=10)
¶
Return mean overlap between similarity and projection neighbors.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
similarity
|
SimilaritySpace
|
Similarity space containing an existing pairwise similarity matrix. |
required |
projection
|
ProjectionSpace
|
Projection space containing existing coordinates. |
required |
k
|
int
|
Number of neighbors to compare for each molecule. |
10
|
Returns:
| Type | Description |
|---|---|
SpaceMetricResult
|
Mean neighbor-overlap fraction and metadata. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If inputs have wrong types, molecule IDs cannot be aligned, |
Notes
The function uses existing similarity and projection artifacts only. It does not perform clustering or automatic chemical interpretation.
descriptor_projection_correlations(descriptors, projection)
¶
Correlate numeric descriptors with projection coordinates.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
descriptors
|
DescriptorSpace
|
Descriptor space containing numeric descriptor columns. |
required |
projection
|
ProjectionSpace
|
Projection space with at least two coordinate columns. |
required |
Returns:
| Type | Description |
|---|---|
DescriptorProjectionCorrelationResult
|
Ranked descriptor-coordinate correlations. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If inputs have wrong types, molecule IDs cannot be aligned, no numeric descriptor features exist, or the projection lacks sufficient coordinates. |
Notes
Inputs are aligned by molecule ID before calculation. The function uses existing descriptor values and projection coordinates only; it does not recalculate descriptors or projections and does not make automatic chemical interpretations.
projection_neighborhood_preservation(similarity, projection, k=10)
¶
Evaluate local neighbor preservation for each molecule.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
similarity
|
SimilaritySpace
|
Similarity space containing an existing pairwise similarity matrix. |
required |
projection
|
ProjectionSpace
|
Projection space containing existing coordinates. |
required |
k
|
int
|
Number of neighbors to compare for each molecule. |
10
|
Returns:
| Type | Description |
|---|---|
NeighborhoodPreservationResult
|
Per-molecule overlap diagnostics and metadata. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If inputs have wrong types, IDs cannot be aligned, |
Notes
This diagnostic compares neighbors from existing similarity and projection spaces. It does not recalculate fingerprints, similarity, projections, or clusters.
compare_descriptor_groups(descriptors, labels, *, min_group_size=2)
¶
Compare numeric descriptors across explicit groups.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
descriptors
|
DescriptorSpace
|
Descriptor space containing numeric descriptor columns. |
required |
labels
|
str or sequence
|
Group labels. A string is interpreted as a column name in
|
required |
min_group_size
|
int
|
Minimum number of molecules required for a group to be included. |
2
|
Returns:
| Type | Description |
|---|---|
DescriptorGroupComparisonResult
|
Ranked group-feature summary table and metadata. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If |
Notes
This function compares groups explicitly provided by the user or by an existing column. It does not perform clustering, enrichment, or automatic chemical interpretation.
Visualization¶
VizInputs
dataclass
¶
Resolved file inputs for visualization code.
Attributes:
| Name | Type | Description |
|---|---|---|
category |
str
|
Workflow category used to resolve inputs. |
root |
Path
|
Root directory of the reconstructed run. |
paths |
tuple of pathlib.Path
|
Existing input paths selected from the run manifest. |
as_dict()
¶
Return a serializable representation of the resolved inputs.
Returns:
| Type | Description |
|---|---|
dict
|
Dictionary with |
resolve_viz_inputs(run, category=None, required=None, kind=None)
¶
Resolve visualization input paths from a reconstructed workflow run.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
run
|
WorkflowRun
|
Reconstructed run whose manifest contains registered outputs. |
required |
category
|
str
|
Output category to select, such as |
None
|
required
|
str or iterable of str
|
Required path fragment or fragments used to select specific files. |
None
|
kind
|
str
|
Semantic artifact kind to select through |
None
|
Returns:
| Type | Description |
|---|---|
VizInputs
|
Existing paths suitable for visualization functions. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If no outputs or artifacts match the requested category/kind. |
FileNotFoundError
|
If |
Notes
This function resolves inputs from an existing WorkflowRun. It does not
create files, run pipeline stages, or generate plots.
plot_hddf_scatters(source)
¶
Generate scatter plots for HDDF descriptor pairs.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
source
|
str, VizInputs, or LoadedArtifact
|
Input source. A string is interpreted as a collection tag. |
required |
Returns:
| Type | Description |
|---|---|
bool
|
|
Notes
This function writes
results/<tag>/figures/correlations/hddf_corr_scatters_trendline.png.
When a reconstructed input object is supplied, the function uses existing
descriptor-table data and does not recalculate descriptors.