Skip to content

VolumeCollection

A collection of volumes sharing the same modality or acquisition type (e.g., all T1-weighted scans). Provides pandas-like indexing with iloc and loc accessors.

radiobject.VolumeCollection

TileDB-backed volume collection indexed by obs_id. Supports uniform or heterogeneous shapes.

iloc cached property

Integer-location based indexing for selecting volumes by position.

index property

Volume index for bidirectional ID/position lookups.

is_uniform property

Whether all volumes in this collection have the same shape.

is_view property

True if this VolumeCollection is a filtered view of another.

loc cached property

Label-based indexing for selecting volumes by obs_id.

name property

Collection name (if set during creation).

obs property

Observational metadata per volume.

obs_ids property

All obs_id values in index order (respects view filter).

obs_subject_ids property

Get obs_subject_id values for this collection (respects view filter).

shape property

Volume dimensions (X, Y, Z) if uniform, None if heterogeneous.

subjects cached property

Subject-level index (obs_subject_id) for this collection.

uri property

URI of the underlying storage (raises if view without storage).

__getitem__(key)

__getitem__(key: int) -> Volume
__getitem__(key: str) -> Volume
__getitem__(key: slice) -> VolumeCollection
__getitem__(key: list[int]) -> VolumeCollection
__getitem__(key: list[str]) -> VolumeCollection

Index by int, str, slice, or list. Slices/lists return views.

__iter__()

Iterate over volumes in index order (respects view filter).

__len__()

Number of volumes in collection (respects view filter).

__repr__()

Concise representation of the VolumeCollection.

append(niftis=None, dicom_dirs=None, reorient=None, progress=False)

Append new volumes atomically.

Volume data and obs metadata are written together to maintain consistency. Cannot be called on views - use write() first.

Parameters:

Name Type Description Default
niftis Sequence[tuple[str | Path, str]] | None

List of (nifti_path, obs_subject_id) tuples.

None
dicom_dirs Sequence[tuple[str | Path, str]] | None

List of (dicom_dir, obs_subject_id) tuples.

None
reorient bool | None

Reorient to canonical orientation (None uses config default).

None
progress bool

Show tqdm progress bar during volume writes.

False
Example

Append new NIfTI files:

radi.T1w.append(
    niftis=[
        ("sub101_T1w.nii.gz", "sub-101"),
        ("sub102_T1w.nii.gz", "sub-102"),
    ],
)

copy()

Create detached copy of this collection (views remain views).

filter(expr)

Filter volumes using TileDB QueryCondition on obs. Returns view.

from_dicoms(uri, dicom_dirs, obs=None, reorient=None, validate_dimensions=True, valid_subject_ids=None, name=None, ctx=None, progress=False) classmethod

Create VolumeCollection from DICOM series with full metadata capture.

Parameters:

Name Type Description Default
uri str

Target URI for the VolumeCollection.

required
dicom_dirs Sequence[tuple[str | Path, str]]

List of (dicom_dir, obs_subject_id) tuples.

required
obs DataFrame | None

Per-volume metadata with custom obs_id values. Positionally aligned with input files. Requires obs_id and obs_subject_id columns. Imaging metadata is always extracted; raises ValueError on column collisions.

None
reorient bool | None

Reorient to canonical orientation (None uses config default).

None
validate_dimensions bool

Raise if dimensions are inconsistent.

True
valid_subject_ids set[str] | None

Optional whitelist for FK validation.

None
name str | None

Collection name (stored in metadata).

None
ctx Ctx | None

TileDB context.

None
progress bool

Show tqdm progress bar during volume writes.

False

Returns:

Type Description
VolumeCollection

VolumeCollection with obs containing DICOM metadata.

from_niftis(uri, niftis, obs=None, reorient=None, validate_dimensions=True, valid_subject_ids=None, name=None, ctx=None, progress=False) classmethod

Create VolumeCollection from NIfTI files with full metadata capture.

Parameters:

Name Type Description Default
uri str

Target URI for the VolumeCollection.

required
niftis Sequence[tuple[str | Path, str]]

List of (nifti_path, obs_subject_id) tuples.

required
obs DataFrame | None

Per-volume metadata with custom obs_id values. Positionally aligned with input files. Requires obs_id and obs_subject_id columns. Imaging metadata is always extracted; raises ValueError on column collisions.

None
reorient bool | None

Reorient to canonical orientation (None uses config default).

None
validate_dimensions bool

Raise if dimensions are inconsistent.

True
valid_subject_ids set[str] | None

Optional whitelist for FK validation.

None
name str | None

Collection name (stored in metadata).

None
ctx Ctx | None

TileDB context.

None
progress bool

Show tqdm progress bar during volume writes.

False

Returns:

Type Description
VolumeCollection

VolumeCollection with obs containing NIfTI metadata.

get_obs_row_by_obs_id(obs_id)

Get observation row by obs_id string identifier.

groupby_subject()

Group volumes by obs_subject_id. Yields (subject_id, view) pairs.

head(n=5)

Return view of first n volumes.

lazy()

Enter lazy mode for deferred transform pipelines.

map(fn)

Apply fn(volume, obs_row) to each volume eagerly. Returns EagerQuery for chaining.

map_batches(fn, batch_size=8)

Apply fn to batches of (volume, obs_row) pairs. Returns EagerQuery.

sample(n=5, seed=None)

Return view of n randomly sampled volumes.

sel(*, subject)

Select volumes by obs_subject_id.

Parameters:

Name Type Description Default
subject str | list[str]

Single subject ID (returns Volume if exactly one match) or list of subject IDs (returns VolumeCollection view).

required

tail(n=5)

Return view of last n volumes.

to_dataset(patch_size=None, labels=None, transform=None)

Create PyTorch Dataset from this collection.

Convenience method for ML training integration.

Parameters:

Name Type Description Default
patch_size tuple[int, int, int] | None

If provided, extract random patches of this size.

None
labels DataFrame | dict | str | None

Label source. Can be: - str: Column name in this collection's obs DataFrame - pd.DataFrame: With obs_id as column/index and label values - dict[str, Any]: Mapping from obs_id to label - None: No labels

None
transform Callable[..., Any] | None

Transform function applied to each sample. MONAI dict transforms (e.g., RandFlipd) work directly.

None

Returns:

Type Description
VolumeCollectionDataset

VolumeCollectionDataset ready for use with DataLoader.

Examples:

Full volumes with labels from obs column:

dataset = radi.CT.to_dataset(labels="has_tumor")

Patch extraction:

dataset = radi.CT.to_dataset(patch_size=(64, 64, 64), labels="grade")

With MONAI transforms:

from monai.transforms import NormalizeIntensityd
dataset = radi.CT.to_dataset(
    labels="has_tumor",
    transform=NormalizeIntensityd(keys="image"),
)

to_obs()

Return obs DataFrame (respects view filter).

validate()

Validate internal consistency of obs vs volume metadata.

write(uri=None, name=None, ctx=None)

Write this collection (or view) to new storage.

Creates a new VolumeCollection at the target URI containing all volumes in this view. For views, only the filtered volumes are written.

Parameters:

Name Type Description Default
uri str | None

Target URI. If None, generates adjacent to source collection.

None
name str | None

Collection name. Also used to derive URI when uri is None.

None
ctx Ctx | None

TileDB context.

None