ML Module
Machine learning integration utilities for PyTorch training with RadiObject data.
Factory Functions
Convenience functions for creating DataLoaders with common configurations.
radiobject.ml.create_training_dataloader(collections, labels=None, batch_size=4, patch_size=None, num_workers=4, pin_memory=True, persistent_workers=True, transform=None, patches_per_volume=1)
Create a DataLoader configured for training from VolumeCollection(s).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
collections
|
VolumeCollection | Sequence[VolumeCollection]
|
Single VolumeCollection or list for multi-modal training. Multi-modal collections are stacked along channel dimension. |
required |
labels
|
LabelSource
|
Label source. Can be: - str: Column name in collection's obs DataFrame - pd.DataFrame: With obs_id as column/index and label values - dict[str, Any]: Mapping from obs_id to label - Callable[[str], Any]: Function taking obs_id, returning label - None: No labels |
None
|
batch_size
|
int
|
Samples per batch. |
4
|
patch_size
|
tuple[int, int, int] | None
|
If provided, extract random patches of this size. |
None
|
num_workers
|
int
|
DataLoader worker processes. |
4
|
pin_memory
|
bool
|
Pin tensors to CUDA memory. |
True
|
persistent_workers
|
bool
|
Keep workers alive between epochs. |
True
|
transform
|
Callable[[dict[str, Any]], dict[str, Any]] | None
|
Transform function applied to each sample. MONAI dict transforms (e.g., RandFlipd) work directly. |
None
|
patches_per_volume
|
int
|
Number of patches to extract per volume per epoch. |
1
|
Returns:
| Type | Description |
|---|---|
DataLoader
|
DataLoader configured for training with shuffle enabled. |
radiobject.ml.create_validation_dataloader(collections, labels=None, batch_size=4, patch_size=None, num_workers=4, pin_memory=True, transform=None)
Create a DataLoader configured for validation (no shuffle, no drop_last).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
collections
|
VolumeCollection | Sequence[VolumeCollection]
|
Single VolumeCollection or list for multi-modal validation. |
required |
labels
|
LabelSource
|
Label source (see create_training_dataloader for options). |
None
|
batch_size
|
int
|
Samples per batch. |
4
|
patch_size
|
tuple[int, int, int] | None
|
If provided, extract patches of this size. |
None
|
num_workers
|
int
|
DataLoader worker processes. |
4
|
pin_memory
|
bool
|
Pin tensors to CUDA memory. |
True
|
transform
|
Callable[[dict[str, Any]], dict[str, Any]] | None
|
Transform function applied to each sample. MONAI dict transforms work directly. |
None
|
Returns:
| Type | Description |
|---|---|
DataLoader
|
DataLoader configured for validation. |
Example::
from monai.transforms import Compose, NormalizeIntensityd
transform = Compose([NormalizeIntensityd(keys="image")])
loader = create_validation_dataloader(radi.CT, labels="has_tumor", transform=transform)
radiobject.ml.create_inference_dataloader(collections, batch_size=1, num_workers=4, pin_memory=True, transform=None)
Create a DataLoader configured for inference (full volumes, no shuffle).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
collections
|
VolumeCollection | Sequence[VolumeCollection]
|
Single VolumeCollection or list for multi-modal inference. |
required |
batch_size
|
int
|
Samples per batch. |
1
|
num_workers
|
int
|
DataLoader worker processes. |
4
|
pin_memory
|
bool
|
Pin tensors to CUDA memory. |
True
|
transform
|
Callable[[dict[str, Any]], dict[str, Any]] | None
|
Transform function applied to each sample. |
None
|
Returns:
| Type | Description |
|---|---|
DataLoader
|
DataLoader configured for inference. |
Example::
from monai.transforms import NormalizeIntensityd
transform = NormalizeIntensityd(keys="image")
loader = create_inference_dataloader(radi.CT, transform=transform)
radiobject.ml.create_segmentation_dataloader(image, mask, batch_size=4, patch_size=None, num_workers=4, pin_memory=True, persistent_workers=True, transform=None, foreground_sampling=False, patches_per_volume=1)
Create a DataLoader for segmentation training with separate image/mask handling.
Unlike create_training_dataloader which stacks collections as channels, this returns separate "image" and "mask" tensors — the standard interface for segmentation workflows.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
image
|
VolumeCollection
|
VolumeCollection containing input images (CT, MRI, etc.). |
required |
mask
|
VolumeCollection
|
VolumeCollection containing segmentation masks. |
required |
batch_size
|
int
|
Samples per batch. |
4
|
patch_size
|
tuple[int, int, int] | None
|
If provided, extract random patches of this size. |
None
|
num_workers
|
int
|
DataLoader worker processes. |
4
|
pin_memory
|
bool
|
Pin tensors to CUDA memory. |
True
|
persistent_workers
|
bool
|
Keep workers alive between epochs. |
True
|
transform
|
Callable[[dict[str, Any]], dict[str, Any]] | None
|
Transform function applied to each sample dict.
MONAI dict transforms work directly — use key selection to
control which tensors are affected (e.g., |
None
|
foreground_sampling
|
bool
|
If True, bias patch sampling toward regions with foreground (non-zero mask values). Foreground coordinates are pre-computed once at init (no extra I/O during training). |
False
|
patches_per_volume
|
int
|
Number of patches to extract per volume per epoch. |
1
|
Returns:
| Type | Description |
|---|---|
DataLoader
|
DataLoader yielding {"image": (B,1,X,Y,Z), "mask": (B,1,X,Y,Z), ...} |
Configuration
radiobject.ml.DatasetConfig
Bases: BaseModel
Configuration for ML datasets.
validate_patch_config()
Validate patch configuration consistency.
validate_patches_per_volume(v)
classmethod
Ensure patches_per_volume is positive.
radiobject.ml.LoadingMode
Bases: str, Enum
Volume loading strategy.
Load entire 3D volumes. Requires uniform shapes. Best for
small volumes or whole-volume models.
PATCH: Extract random 3D sub-arrays via TileDB slice reads. Supports heterogeneous shapes. Primary mode for large-volume training. SLICE_2D: Extract 2D slices along a configurable orientation axis. Requires uniform shapes.
Datasets
radiobject.ml.VolumeCollectionDataset
Bases: Dataset
PyTorch Dataset for VolumeCollection(s) - primary ML interface.
collection_names
property
Names of collections being loaded (channel order).
n_channels
property
Number of channels (collections) in output tensors.
volume_shape
property
Shape of each volume (X, Y, Z).
__init__(collections, config=None, labels=None, transform=None)
Initialize dataset from VolumeCollection(s).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
collections
|
VolumeCollection | Sequence[VolumeCollection]
|
Single VolumeCollection or sequence of collections. Multiple collections are stacked along channel dimension. |
required |
config
|
DatasetConfig | None
|
Dataset configuration (loading mode, patch size, etc.). If None, uses full volume mode. |
None
|
labels
|
LabelSource
|
Label source. Can be: - str: Column name in collection's obs DataFrame - pd.DataFrame: With obs_id as column/index and label values - dict[str, Any]: Mapping from obs_id to label - Callable[[str], Any]: Function taking obs_id, returning label - None: No labels |
None
|
transform
|
Callable[[dict[str, Any]], dict[str, Any]] | None
|
Transform function applied to each sample dict. MONAI dict transforms (e.g., RandFlipd) work directly. |
None
|
radiobject.ml.SegmentationDataset
Bases: Dataset
PyTorch Dataset for segmentation training with explicit image/mask separation.
n_volumes
property
Number of image/mask pairs.
volume_shape
property
Shape of each volume (X, Y, Z).
TorchIO Compatibility
radiobject.ml.VolumeCollectionSubjectsDataset
Bases: Dataset
TorchIO-compatible dataset yielding Subject objects from VolumeCollection(s).
collection_names
property
Names of collections in each Subject.
__getitem__(idx)
Return TorchIO Subject with images for all collections.
__init__(collections, labels=None, transform=None)
Initialize TorchIO-compatible dataset.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
collections
|
VolumeCollection | Sequence[VolumeCollection]
|
Single VolumeCollection or sequence of collections. Each collection becomes a separate image in the Subject. |
required |
labels
|
LabelSource
|
Label source. Can be: - str: Column name in collection's obs DataFrame - pd.DataFrame: With obs_id as column/index and label values - dict[str, Any]: Mapping from obs_id to label - Callable[[str], Any]: Function taking obs_id, returning label - None: No labels |
None
|
transform
|
Any | None
|
TorchIO transform (e.g., tio.Compose) applied to each Subject. |
None
|