ML Module

Machine learning integration utilities for PyTorch training with RadiObject data.

Factory Functions

Convenience functions for creating DataLoaders with common configurations.

`radiobject.ml.create_training_dataloader(collections, labels=None, batch_size=4, patch_size=None, num_workers=4, pin_memory=True, persistent_workers=True, transform=None, patches_per_volume=1)`

Create a DataLoader configured for training from VolumeCollection(s).

Parameters:

Name	Type	Description	Default
`collections`	`VolumeCollection \| Sequence[VolumeCollection]`	Single VolumeCollection or list for multi-modal training. Multi-modal collections are stacked along channel dimension.	required
`labels`	`LabelSource`	Label source. Can be: - str: Column name in collection's obs DataFrame - pd.DataFrame: With obs_id as column/index and label values - dict[str, Any]: Mapping from obs_id to label - Callable[[str], Any]: Function taking obs_id, returning label - None: No labels	`None`
`batch_size`	`int`	Samples per batch.	`4`
`patch_size`	`tuple[int, int, int] \| None`	If provided, extract random patches of this size.	`None`
`num_workers`	`int`	DataLoader worker processes.	`4`
`pin_memory`	`bool`	Pin tensors to CUDA memory.	`True`
`persistent_workers`	`bool`	Keep workers alive between epochs.	`True`
`transform`	`Callable[[dict[str, Any]], dict[str, Any]] \| None`	Transform function applied to each sample. MONAI dict transforms (e.g., RandFlipd) work directly.	`None`
`patches_per_volume`	`int`	Number of patches to extract per volume per epoch.	`1`

Returns:

Type	Description
`DataLoader`	DataLoader configured for training with shuffle enabled.

`radiobject.ml.create_validation_dataloader(collections, labels=None, batch_size=4, patch_size=None, num_workers=4, pin_memory=True, transform=None)`

Create a DataLoader configured for validation (no shuffle, no drop_last).

Parameters:

Name	Type	Description	Default
`collections`	`VolumeCollection \| Sequence[VolumeCollection]`	Single VolumeCollection or list for multi-modal validation.	required
`labels`	`LabelSource`	Label source (see create_training_dataloader for options).	`None`
`batch_size`	`int`	Samples per batch.	`4`
`patch_size`	`tuple[int, int, int] \| None`	If provided, extract patches of this size.	`None`
`num_workers`	`int`	DataLoader worker processes.	`4`
`pin_memory`	`bool`	Pin tensors to CUDA memory.	`True`
`transform`	`Callable[[dict[str, Any]], dict[str, Any]] \| None`	Transform function applied to each sample. MONAI dict transforms work directly.	`None`

Returns:

Type	Description
`DataLoader`	DataLoader configured for validation.

Example::

from monai.transforms import Compose, NormalizeIntensityd

transform = Compose([NormalizeIntensityd(keys="image")])
loader = create_validation_dataloader(radi.CT, labels="has_tumor", transform=transform)

`radiobject.ml.create_inference_dataloader(collections, batch_size=1, num_workers=4, pin_memory=True, transform=None)`

Create a DataLoader configured for inference (full volumes, no shuffle).

Parameters:

Name	Type	Description	Default
`collections`	`VolumeCollection \| Sequence[VolumeCollection]`	Single VolumeCollection or list for multi-modal inference.	required
`batch_size`	`int`	Samples per batch.	`1`
`num_workers`	`int`	DataLoader worker processes.	`4`
`pin_memory`	`bool`	Pin tensors to CUDA memory.	`True`
`transform`	`Callable[[dict[str, Any]], dict[str, Any]] \| None`	Transform function applied to each sample.	`None`

Returns:

Type	Description
`DataLoader`	DataLoader configured for inference.

Example::

from monai.transforms import NormalizeIntensityd

transform = NormalizeIntensityd(keys="image")
loader = create_inference_dataloader(radi.CT, transform=transform)

`radiobject.ml.create_segmentation_dataloader(image, mask, batch_size=4, patch_size=None, num_workers=4, pin_memory=True, persistent_workers=True, transform=None, foreground_sampling=False, patches_per_volume=1)`

Create a DataLoader for segmentation training with separate image/mask handling.

Unlike create_training_dataloader which stacks collections as channels, this returns separate "image" and "mask" tensors — the standard interface for segmentation workflows.

Parameters:

Name	Type	Description	Default
`image`	`VolumeCollection`	VolumeCollection containing input images (CT, MRI, etc.).	required
`mask`	`VolumeCollection`	VolumeCollection containing segmentation masks.	required
`batch_size`	`int`	Samples per batch.	`4`
`patch_size`	`tuple[int, int, int] \| None`	If provided, extract random patches of this size.	`None`
`num_workers`	`int`	DataLoader worker processes.	`4`
`pin_memory`	`bool`	Pin tensors to CUDA memory.	`True`
`persistent_workers`	`bool`	Keep workers alive between epochs.	`True`
`transform`	`Callable[[dict[str, Any]], dict[str, Any]] \| None`	Transform function applied to each sample dict. MONAI dict transforms work directly — use key selection to control which tensors are affected (e.g., `RandFlipd(keys=["image", "mask"])` for spatial transforms, `NormalizeIntensityd(keys="image")` for image-only transforms).	`None`
`foreground_sampling`	`bool`	If True, bias patch sampling toward regions with foreground (non-zero mask values). Foreground coordinates are pre-computed once at init (no extra I/O during training).	`False`
`patches_per_volume`	`int`	Number of patches to extract per volume per epoch.	`1`

Returns:

Type	Description
`DataLoader`	DataLoader yielding {"image": (B,1,X,Y,Z), "mask": (B,1,X,Y,Z), ...}

Configuration

`radiobject.ml.DatasetConfig`

Bases: BaseModel

Configuration for ML datasets.

`validate_patch_config()`

Validate patch configuration consistency.

`validate_patches_per_volume(v)` `classmethod`

Ensure patches_per_volume is positive.

`radiobject.ml.LoadingMode`

Bases: str, Enum

Volume loading strategy.

Load entire 3D volumes. Requires uniform shapes. Best for

small volumes or whole-volume models.

PATCH: Extract random 3D sub-arrays via TileDB slice reads. Supports heterogeneous shapes. Primary mode for large-volume training. SLICE_2D: Extract 2D slices along a configurable orientation axis. Requires uniform shapes.

Datasets

`radiobject.ml.VolumeCollectionDataset`

Bases: Dataset

PyTorch Dataset for VolumeCollection(s) - primary ML interface.

`collection_names` `property`

Names of collections being loaded (channel order).

`n_channels` `property`

Number of channels (collections) in output tensors.

`volume_shape` `property`

Shape of each volume (X, Y, Z).

`init(collections, config=None, labels=None, transform=None)`

Initialize dataset from VolumeCollection(s).

Parameters:

Name	Type	Description	Default
`collections`	`VolumeCollection \| Sequence[VolumeCollection]`	Single VolumeCollection or sequence of collections. Multiple collections are stacked along channel dimension.	required
`config`	`DatasetConfig \| None`	Dataset configuration (loading mode, patch size, etc.). If None, uses full volume mode.	`None`
`labels`	`LabelSource`	Label source. Can be: - str: Column name in collection's obs DataFrame - pd.DataFrame: With obs_id as column/index and label values - dict[str, Any]: Mapping from obs_id to label - Callable[[str], Any]: Function taking obs_id, returning label - None: No labels	`None`
`transform`	`Callable[[dict[str, Any]], dict[str, Any]] \| None`	Transform function applied to each sample dict. MONAI dict transforms (e.g., RandFlipd) work directly.	`None`

`radiobject.ml.SegmentationDataset`

Bases: Dataset

PyTorch Dataset for segmentation training with explicit image/mask separation.

`n_volumes` `property`

Number of image/mask pairs.

`volume_shape` `property`

Shape of each volume (X, Y, Z).

TorchIO Compatibility

`radiobject.ml.VolumeCollectionSubjectsDataset`

Bases: Dataset

TorchIO-compatible dataset yielding Subject objects from VolumeCollection(s).

`collection_names` `property`

Names of collections in each Subject.

`getitem(idx)`

Return TorchIO Subject with images for all collections.

`init(collections, labels=None, transform=None)`

Initialize TorchIO-compatible dataset.

Parameters:

Name	Type	Description	Default
`collections`	`VolumeCollection \| Sequence[VolumeCollection]`	Single VolumeCollection or sequence of collections. Each collection becomes a separate image in the Subject.	required
`labels`	`LabelSource`	Label source. Can be: - str: Column name in collection's obs DataFrame - pd.DataFrame: With obs_id as column/index and label values - dict[str, Any]: Mapping from obs_id to label - Callable[[str], Any]: Function taking obs_id, returning label - None: No labels	`None`
`transform`	`Any \| None`	TorchIO transform (e.g., tio.Compose) applied to each Subject.	`None`

ML Module

Factory Functions

radiobject.ml.create_training_dataloader(collections, labels=None, batch_size=4, patch_size=None, num_workers=4, pin_memory=True, persistent_workers=True, transform=None, patches_per_volume=1)

radiobject.ml.create_validation_dataloader(collections, labels=None, batch_size=4, patch_size=None, num_workers=4, pin_memory=True, transform=None)

radiobject.ml.create_inference_dataloader(collections, batch_size=1, num_workers=4, pin_memory=True, transform=None)

radiobject.ml.create_segmentation_dataloader(image, mask, batch_size=4, patch_size=None, num_workers=4, pin_memory=True, persistent_workers=True, transform=None, foreground_sampling=False, patches_per_volume=1)

Configuration

radiobject.ml.DatasetConfig

validate_patch_config()

validate_patches_per_volume(v) classmethod

radiobject.ml.LoadingMode

Datasets

radiobject.ml.VolumeCollectionDataset

collection_names property

n_channels property

volume_shape property

__init__(collections, config=None, labels=None, transform=None)

radiobject.ml.SegmentationDataset

n_volumes property

volume_shape property

TorchIO Compatibility

radiobject.ml.VolumeCollectionSubjectsDataset

collection_names property

__getitem__(idx)

__init__(collections, labels=None, transform=None)

`radiobject.ml.create_training_dataloader(collections, labels=None, batch_size=4, patch_size=None, num_workers=4, pin_memory=True, persistent_workers=True, transform=None, patches_per_volume=1)`

`radiobject.ml.create_validation_dataloader(collections, labels=None, batch_size=4, patch_size=None, num_workers=4, pin_memory=True, transform=None)`

`radiobject.ml.create_inference_dataloader(collections, batch_size=1, num_workers=4, pin_memory=True, transform=None)`

`radiobject.ml.create_segmentation_dataloader(image, mask, batch_size=4, patch_size=None, num_workers=4, pin_memory=True, persistent_workers=True, transform=None, foreground_sampling=False, patches_per_volume=1)`

`radiobject.ml.DatasetConfig`

`validate_patch_config()`

`validate_patches_per_volume(v)` `classmethod`

`radiobject.ml.LoadingMode`

`radiobject.ml.VolumeCollectionDataset`

`collection_names` `property`

`n_channels` `property`

`volume_shape` `property`

`init(collections, config=None, labels=None, transform=None)`

`radiobject.ml.SegmentationDataset`

`n_volumes` `property`

`volume_shape` `property`

`radiobject.ml.VolumeCollectionSubjectsDataset`

`collection_names` `property`

`getitem(idx)`

`init(collections, labels=None, transform=None)`