Skip to content

ML Module

Machine learning integration utilities for PyTorch training with RadiObject data.

Factory Functions

Convenience functions for creating DataLoaders with common configurations.

radiobject.ml.create_training_dataloader(collections, labels=None, batch_size=4, patch_size=None, num_workers=4, pin_memory=True, persistent_workers=True, transform=None, patches_per_volume=1)

Create a DataLoader configured for training from VolumeCollection(s).

Parameters:

Name Type Description Default
collections VolumeCollection | Sequence[VolumeCollection]

Single VolumeCollection or list for multi-modal training. Multi-modal collections are stacked along channel dimension.

required
labels LabelSource

Label source. Can be: - str: Column name in collection's obs DataFrame - pd.DataFrame: With obs_id as column/index and label values - dict[str, Any]: Mapping from obs_id to label - Callable[[str], Any]: Function taking obs_id, returning label - None: No labels

None
batch_size int

Samples per batch.

4
patch_size tuple[int, int, int] | None

If provided, extract random patches of this size.

None
num_workers int

DataLoader worker processes.

4
pin_memory bool

Pin tensors to CUDA memory.

True
persistent_workers bool

Keep workers alive between epochs.

True
transform Callable[[dict[str, Any]], dict[str, Any]] | None

Transform function applied to each sample. MONAI dict transforms (e.g., RandFlipd) work directly.

None
patches_per_volume int

Number of patches to extract per volume per epoch.

1

Returns:

Type Description
DataLoader

DataLoader configured for training with shuffle enabled.

radiobject.ml.create_validation_dataloader(collections, labels=None, batch_size=4, patch_size=None, num_workers=4, pin_memory=True, transform=None)

Create a DataLoader configured for validation (no shuffle, no drop_last).

Parameters:

Name Type Description Default
collections VolumeCollection | Sequence[VolumeCollection]

Single VolumeCollection or list for multi-modal validation.

required
labels LabelSource

Label source (see create_training_dataloader for options).

None
batch_size int

Samples per batch.

4
patch_size tuple[int, int, int] | None

If provided, extract patches of this size.

None
num_workers int

DataLoader worker processes.

4
pin_memory bool

Pin tensors to CUDA memory.

True
transform Callable[[dict[str, Any]], dict[str, Any]] | None

Transform function applied to each sample. MONAI dict transforms work directly.

None

Returns:

Type Description
DataLoader

DataLoader configured for validation.

Example::

from monai.transforms import Compose, NormalizeIntensityd

transform = Compose([NormalizeIntensityd(keys="image")])
loader = create_validation_dataloader(radi.CT, labels="has_tumor", transform=transform)

radiobject.ml.create_inference_dataloader(collections, batch_size=1, num_workers=4, pin_memory=True, transform=None)

Create a DataLoader configured for inference (full volumes, no shuffle).

Parameters:

Name Type Description Default
collections VolumeCollection | Sequence[VolumeCollection]

Single VolumeCollection or list for multi-modal inference.

required
batch_size int

Samples per batch.

1
num_workers int

DataLoader worker processes.

4
pin_memory bool

Pin tensors to CUDA memory.

True
transform Callable[[dict[str, Any]], dict[str, Any]] | None

Transform function applied to each sample.

None

Returns:

Type Description
DataLoader

DataLoader configured for inference.

Example::

from monai.transforms import NormalizeIntensityd

transform = NormalizeIntensityd(keys="image")
loader = create_inference_dataloader(radi.CT, transform=transform)

radiobject.ml.create_segmentation_dataloader(image, mask, batch_size=4, patch_size=None, num_workers=4, pin_memory=True, persistent_workers=True, transform=None, foreground_sampling=False, patches_per_volume=1)

Create a DataLoader for segmentation training with separate image/mask handling.

Unlike create_training_dataloader which stacks collections as channels, this returns separate "image" and "mask" tensors — the standard interface for segmentation workflows.

Parameters:

Name Type Description Default
image VolumeCollection

VolumeCollection containing input images (CT, MRI, etc.).

required
mask VolumeCollection

VolumeCollection containing segmentation masks.

required
batch_size int

Samples per batch.

4
patch_size tuple[int, int, int] | None

If provided, extract random patches of this size.

None
num_workers int

DataLoader worker processes.

4
pin_memory bool

Pin tensors to CUDA memory.

True
persistent_workers bool

Keep workers alive between epochs.

True
transform Callable[[dict[str, Any]], dict[str, Any]] | None

Transform function applied to each sample dict. MONAI dict transforms work directly — use key selection to control which tensors are affected (e.g., RandFlipd(keys=["image", "mask"]) for spatial transforms, NormalizeIntensityd(keys="image") for image-only transforms).

None
foreground_sampling bool

If True, bias patch sampling toward regions with foreground (non-zero mask values). Foreground coordinates are pre-computed once at init (no extra I/O during training).

False
patches_per_volume int

Number of patches to extract per volume per epoch.

1

Returns:

Type Description
DataLoader

DataLoader yielding {"image": (B,1,X,Y,Z), "mask": (B,1,X,Y,Z), ...}

Configuration

radiobject.ml.DatasetConfig

Bases: BaseModel

Configuration for ML datasets.

validate_patch_config()

Validate patch configuration consistency.

validate_patches_per_volume(v) classmethod

Ensure patches_per_volume is positive.

radiobject.ml.LoadingMode

Bases: str, Enum

Volume loading strategy.

Load entire 3D volumes. Requires uniform shapes. Best for

small volumes or whole-volume models.

PATCH: Extract random 3D sub-arrays via TileDB slice reads. Supports heterogeneous shapes. Primary mode for large-volume training. SLICE_2D: Extract 2D slices along a configurable orientation axis. Requires uniform shapes.

Datasets

radiobject.ml.VolumeCollectionDataset

Bases: Dataset

PyTorch Dataset for VolumeCollection(s) - primary ML interface.

collection_names property

Names of collections being loaded (channel order).

n_channels property

Number of channels (collections) in output tensors.

volume_shape property

Shape of each volume (X, Y, Z).

__init__(collections, config=None, labels=None, transform=None)

Initialize dataset from VolumeCollection(s).

Parameters:

Name Type Description Default
collections VolumeCollection | Sequence[VolumeCollection]

Single VolumeCollection or sequence of collections. Multiple collections are stacked along channel dimension.

required
config DatasetConfig | None

Dataset configuration (loading mode, patch size, etc.). If None, uses full volume mode.

None
labels LabelSource

Label source. Can be: - str: Column name in collection's obs DataFrame - pd.DataFrame: With obs_id as column/index and label values - dict[str, Any]: Mapping from obs_id to label - Callable[[str], Any]: Function taking obs_id, returning label - None: No labels

None
transform Callable[[dict[str, Any]], dict[str, Any]] | None

Transform function applied to each sample dict. MONAI dict transforms (e.g., RandFlipd) work directly.

None

radiobject.ml.SegmentationDataset

Bases: Dataset

PyTorch Dataset for segmentation training with explicit image/mask separation.

n_volumes property

Number of image/mask pairs.

volume_shape property

Shape of each volume (X, Y, Z).

TorchIO Compatibility

radiobject.ml.VolumeCollectionSubjectsDataset

Bases: Dataset

TorchIO-compatible dataset yielding Subject objects from VolumeCollection(s).

collection_names property

Names of collections in each Subject.

__getitem__(idx)

Return TorchIO Subject with images for all collections.

__init__(collections, labels=None, transform=None)

Initialize TorchIO-compatible dataset.

Parameters:

Name Type Description Default
collections VolumeCollection | Sequence[VolumeCollection]

Single VolumeCollection or sequence of collections. Each collection becomes a separate image in the Subject.

required
labels LabelSource

Label source. Can be: - str: Column name in collection's obs DataFrame - pd.DataFrame: With obs_id as column/index and label values - dict[str, Any]: Mapping from obs_id to label - Callable[[str], Any]: Function taking obs_id, returning label - None: No labels

None
transform Any | None

TorchIO transform (e.g., tio.Compose) applied to each Subject.

None