API Reference¶
CHAMANP exposes a minimal, stable public API through four symbols and a version
string. Internal implementation modules under chamanp/_core/ and
chamanp/_utils/ are private and not part of this contract.
import chamanp
from chamanp import ChamanpConfig, ChamanpResult, validate_config, run
ChamanpConfig¶
Runtime configuration contract for a CHAMANP pipeline execution.
All fields have defaults matching the current COCONUT reference molecular dataset
configuration. Construct a custom configuration by passing field values
directly, or load an external profile with from_module or
from_toml. Loaded configurations are not preflight-validated until
validate_config is called.
Attributes:
| Name | Type | Description |
|---|---|---|
DATABASE_PATH |
str
|
Path to the input molecular dataset CSV file.
Default: |
REPORTS_PATH |
str
|
Directory for pipeline execution reports.
Default: |
COLLECTION_TAXONOMY_PATH |
str
|
Path to the collection taxonomy JSON file.
Default: |
TARGET_COLLECTIONS |
list of str
|
Collection labels to include in the filtered dataset.
Default: |
COLLECTION_TAG |
str
|
Short alphanumeric tag used in artifact file names.
Default: |
COLLECTION_LOGIC |
str
|
Logical operator applied when filtering across target collections.
Must be |
MORGAN_RADIUS |
int
|
Morgan fingerprint radius. Must be an integer >= 0.
Default: |
MORGAN_BITS |
int
|
Morgan fingerprint bit length. Must be a positive integer.
Default: |
SELECTED_PROPERTIES |
list of str
|
Column names retained from the molecular dataset after curation.
Default: the eight columns in |
REMOVE_STEREO_DUPLICATES |
bool
|
Whether stereochemical duplicates are removed during curation.
Default: |
Examples:
Construct a configuration with default values:
>>> from chamanp import ChamanpConfig
>>> config = ChamanpConfig()
>>> config.COLLECTION_TAG
'pubchem'
Construct a configuration with custom values:
>>> config = ChamanpConfig(
... DATABASE_PATH="data/my_dataset.csv",
... COLLECTION_TAXONOMY_PATH="data/taxonomy.json",
... TARGET_COLLECTIONS=["Marine NPs"],
... COLLECTION_TAG="marine",
... COLLECTION_LOGIC="OR",
... MORGAN_RADIUS=2,
... MORGAN_BITS=2048,
... )
>>> config.COLLECTION_TAG
'marine'
from_module(module)
classmethod
¶
Build a ChamanpConfig from a module-like object.
Reads every ChamanpConfig field name as an attribute from
module and returns a new ChamanpConfig instance. The loaded
configuration is not preflight-validated; call validate_config
to validate before execution.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
module
|
module - like
|
An object that exposes all |
required |
Returns:
| Type | Description |
|---|---|
ChamanpConfig
|
A new configuration instance populated from the module attributes. |
Raises:
| Type | Description |
|---|---|
AttributeError
|
If any |
Examples:
Load configuration from the repository-level config.py:
>>> import config
>>> from chamanp import ChamanpConfig
>>> cfg = ChamanpConfig.from_module(config)
from_toml(path)
classmethod
¶
Build a ChamanpConfig from a TOML file.
Reads configuration values from a TOML file at path. TOML keys
must be lowercase versions of ChamanpConfig field names (for
example, database_path maps to DATABASE_PATH). Unknown keys
raise a ValueError. The loaded configuration is not
preflight-validated; call validate_config to validate before
execution.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
str or path - like
|
File system path to the TOML configuration profile. |
required |
Returns:
| Type | Description |
|---|---|
ChamanpConfig
|
A new configuration instance populated from the TOML file. |
Raises:
| Type | Description |
|---|---|
FileNotFoundError
|
If path does not exist. |
ValueError
|
If the file is not valid TOML, or if it contains unknown keys. |
Notes
from_toml does not perform preflight validation. File paths
referenced in the loaded configuration are not checked for existence
until validate_config is called.
Examples:
Load configuration from a user TOML profile:
>>> from chamanp import ChamanpConfig
>>> config = ChamanpConfig.from_toml("my-chamanp-profile.toml")
required_fields()
classmethod
¶
Return the names of all ChamanpConfig fields.
Returns:
| Type | Description |
|---|---|
tuple of str
|
Names of all configuration fields in declaration order. |
ChamanpResult¶
Frozen summary of a completed CHAMANP pipeline execution.
Returned by run after a successful pipeline run. All path fields
are strings. Count fields may be None if the pipeline did not
produce them. Execution failures are exception-based and do not
produce a result object.
Notes
CHAMANP is currently pre-stable (Alpha). Field names and types may change before a stable release is declared.
Attributes:
| Name | Type | Description |
|---|---|---|
status |
str
|
Execution status. Always |
version |
str
|
CHAMANP package version at the time of execution. |
collection_tag |
str
|
Short collection tag used to name output artifacts, taken from
|
curated_path |
str
|
File system path to the curated molecular dataset CSV. |
filtered_path |
str
|
File system path to the collection-filtered dataset CSV. |
metadata_path |
str
|
File system path to the fingerprint metadata CSV. |
fingerprints_path |
str
|
File system path to the Morgan fingerprint matrix ( |
invalid_smiles_path |
str
|
File system path to the invalid-SMILES traceability CSV. |
report_path |
str
|
File system path to the pipeline execution report. |
fingerprint_radius |
int
|
Morgan fingerprint radius used during generation, taken from
|
fingerprint_bits |
int
|
Morgan fingerprint bit length used during generation, taken from
|
total_input_size |
int or None
|
Total number of data rows in the input CSV, excluding the header. |
total_after_dedup |
int or None
|
Number of rows remaining after stereochemical deduplication. |
stereo_removed_count |
int or None
|
Number of rows removed during stereochemical deduplication
( |
filtered_count |
int or None
|
Number of molecular dataset entries remaining after collection filtering. |
valid_molecules_count |
int or None
|
Number of molecular dataset entries for which a valid fingerprint was generated.
( |
invalid_smiles_count |
int or None
|
Number of compounds whose SMILES string could not be parsed by RDKit during fingerprint generation. |
to_dict()
¶
Return a plain-dictionary representation of the execution result.
Returns:
| Type | Description |
|---|---|
dict
|
A dictionary with field names as keys and field values as
values, produced by |
validate_config¶
Validate a CHAMANP runtime configuration object.
Checks that required file paths exist, that collection settings are
well-formed, and that fingerprint parameters are valid integers.
COLLECTION_LOGIC and COLLECTION_TAG values are normalized
in-place (stripped and upper-cased where applicable) before the
validated configuration is returned.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
config
|
ChamanpConfig
|
Configuration to validate. When |
None
|
Returns:
| Type | Description |
|---|---|
ChamanpConfig
|
The validated configuration object, with |
Raises:
| Type | Description |
|---|---|
ConfigurationError
|
If one or more validation checks fail. The error message lists all failing checks. |
Examples:
Validate a configuration before running the pipeline:
>>> from chamanp import ChamanpConfig, validate_config
>>> config = ChamanpConfig(
... DATABASE_PATH="data/coconut.csv",
... COLLECTION_TAXONOMY_PATH="data/taxonomy.json",
... TARGET_COLLECTIONS=["PubChem NPs"],
... COLLECTION_TAG="pubchem",
... )
>>> validated = validate_config(config)
run¶
Validate and execute CHAMANP, writing configured artifacts to disk.
Calls validate_config on config, then runs the private pipeline
implementation. The pipeline curates the molecular dataset, filters by
target collections, generates Morgan fingerprints, and writes a summary
report. The pipeline writes configured artifacts to disk during execution.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
config
|
ChamanpConfig
|
Runtime configuration. When |
None
|
Returns:
| Type | Description |
|---|---|
ChamanpResult
|
A frozen result object containing execution status, artifact paths,
and summary counts. See |
Raises:
| Type | Description |
|---|---|
ConfigurationError
|
If configuration validation fails before execution begins. |
Notes
run validates the configuration, instantiates the private pipeline
implementation internally, and writes configured artifacts to disk during
execution.
The internal pipeline implementation is private and should not be imported or used directly.
Examples:
Run the pipeline with a custom configuration:
>>> from chamanp import ChamanpConfig, run
>>> config = ChamanpConfig(
... DATABASE_PATH="data/coconut.csv",
... COLLECTION_TAXONOMY_PATH="data/taxonomy.json",
... TARGET_COLLECTIONS=["PubChem NPs"],
... COLLECTION_TAG="pubchem",
... )
>>> result = run(config)