Benchmarks

RadiObject enables 200-560x faster partial reads and native S3 access.

Benchmark overview

Configuration: batch_size=4, patch_size=64^3, n_runs=10, 20 MSD subjects, local SSD + S3 (us-east-2)

Full Volume Load

Full volume load times

framework	scenario	tiling	time_ms	cpu_pct	heap_mb
RadiObject	local	isotropic	99	62	304
nibabel	local	-	33	31	608
numpy	local	-	42	38	608
nibabel	local	gzip	418	24	912
RadiObject	local	axial	705	40	304
TorchIO	local	-	716	38	304
MONAI	local	-	1028	26	608
RadiObject	s3	axial	26256	16	304
zarr	local	axial	49	51	146
zarr	local	isotropic	22	59	177
zarr	s3	axial	1351	34	142

RadiObject isotropic (99ms) is competitive with raw nibabel (33ms) while enabling random access. Zarr full volume loads are fast (22-49ms local) due to lightweight metadata, but lack RadiObject's integrated metadata and caching layer.

See Performance: Why Full Volume is Slower for interpretation.

2D Slice Extraction

Slice extraction times

method	scenario	tiling	time_ms
RadiObject	local	axial	3.4
zarr	local	axial	1.4
RadiObject	local	isotropic	24
RadiObject	s3	axial	203
zarr	s3	axial	93
MONAI	local	-	1115
TorchIO	local	-	731

Both chunked formats achieve sub-5ms local slice extraction. Zarr's lower overhead gives a slight edge for individual slices, while RadiObject's caching benefits repeated access.

See Performance: Why Axial Tiling Gives 200-600x Speedup for interpretation.

3D ROI Extraction

ROI extraction times

method	scenario	tiling	time_ms
RadiObject	local	isotropic	2.0
zarr	local	isotropic	2.9
RadiObject	local	axial	20
RadiObject	s3	isotropic	238
zarr	s3	isotropic	63
MONAI	local	-	1106
TorchIO	local	-	739

559x faster than MONAI, 374x faster than TorchIO for isotropic 64^3 ROIs. RadiObject and Zarr are comparable for local 3D partial reads.

See Performance: Why Isotropic is Best for 3D Patches for interpretation.

Framework Speedups

Speedup vs MONAI

Speedup vs TorchIO

Speedup vs Zarr

operation	monai_ms	torchio_ms	zarr_ms	radiobject_ms	vs_monai	vs_torchio	vs_zarr
slice_2d	1115	731	1.4	3.4	325x	213x	0.4x
roi_3d	1106	739	2.9	2.0	559x	374x	1.5x

MONAI and TorchIO must load the full volume for any access pattern. RadiObject and Zarr both read only the chunks/tiles needed. RadiObject adds integrated metadata, caching, and S3 VFS — Zarr is a raw array format.

S3 vs Local

framework	operation	scenario	tiling	time_ms
RadiObject	full_volume	local	axial	705
RadiObject	full_volume	s3	axial	26256
RadiObject	slice_2d	local	axial	3.4
RadiObject	slice_2d	s3	axial	203
RadiObject	roi_3d	local	isotropic	2.0
RadiObject	roi_3d	s3	isotropic	238
zarr	full_volume	local	axial	49
zarr	full_volume	s3	axial	1351
zarr	slice_2d	local	axial	1.4
zarr	slice_2d	s3	axial	93
zarr	roi_3d	local	isotropic	2.9
zarr	roi_3d	s3	isotropic	63

Partial reads on S3 (203-238ms) are 110-130x faster than full volume S3 reads (26256ms). Zarr S3 reads via fsspec are faster for individual operations but lack TileDB's VFS-level parallelism for batch access.

See Performance: Why S3 is ~14x Slower for interpretation.

Format and Storage Overhead

Disk space comparison

format	size_gb	compression	partial_read
NIfTI (.nii.gz)	2.1	2.84x	No
NIfTI (.nii)	6.5	0.91x	No
NumPy (.npy)	13.0	0.46x	No
TileDB (axial)	6.1	0.98x	Yes
TileDB (iso)	5.6	1.06x	Yes
Zarr (axial)	0.2	35.8x	Yes
Zarr (iso)	0.2	35.9x	Yes

TileDB uses ~3x more space than gzipped NIfTI, but enables partial reads that are 200-560x faster. Zarr achieves high compression (36x) on sparse medical imaging data (MSD brain tumour) due to low per-chunk overhead with ZSTD.

Tiling Strategy Impact

Tiling heatmap

access_pattern	axial_ms	isotropic_ms	best_choice
axial_slice	3.8	24	Axial
coronal_slice	85	15	Isotropic
sagittal_slice	83	12	Isotropic
roi_32	10	1.6	Isotropic
roi_64	20	1.5	Isotropic
roi_128	51	4.4	Isotropic

Zarr Chunking:

access_pattern	zarr_axial_ms	zarr_isotropic_ms
axial_slice	1.3	7.4
coronal_slice	29	6.2
sagittal_slice	28	6.4
roi_32	6.0	2.8
roi_64	12	2.9
roi_128	31	6.2

See Performance: Tiling Strategy Guide.

Memory Efficiency

operation	radiobject_mb	nifti_load_mb
slice_extraction	1	300-900
full_volume	304	300-600

Partial reads use minimal memory because only the requested tiles are loaded.

ML Training Throughput

Dataloader throughput

framework	ms_per_batch	samples_per_sec	notes
RadiObject	31	128	isotropic, local
zarr	29	138	isotropic, local
TorchIO	3306	1.2	local, full volume load
zarr	4423	0.9	isotropic, S3
RadiObject	31434	0.1	isotropic, S3

Zarr and RadiObject achieve comparable local throughput (~128-138 samples/sec). Both are 100x faster than TorchIO for patch-based training (batch_size=4, patch_size=64^3). On S3, Zarr's fsspec issues async chunk requests more efficiently (0.9 vs 0.1 samples/sec).

Patch-Based I/O Reduction

loading_mode	data_per_sample	10k_subject_epoch_io
FULL_VOLUME	35.6 MB	356 GB
PATCH (64^3)	262 KB	2.6 GB
PATCH (128^3)	2.1 MB	21 GB

Multi-Worker Scaling

num_workers	time_3_volumes	time_per_volume
0	0.16s	0.05s
1	6.06s	2.02s
2	11.14s	3.71s

num_workers=0 for <100 volumes, num_workers=4-8 for >1000 volumes.

See Performance: Why Multi-Worker DataLoaders Slow Down for interpretation.

Cache Hit Rates (TileDB)

TileDB maintains a built-in LRU tile cache within its context object. Zarr v3 has no built-in chunk cache — it relies on OS page cache for local reads and has no caching for S3. This is a key TileDB advantage for workloads with repeated or overlapping access patterns.

access_pattern	tiledb_hit_rate	zarr
Sequential (shared context)	85-95%	OS page cache only
Random (shared context)	60-75%	OS page cache only
Repeated slices (same volume)	90-99%	OS page cache only
S3 repeated access	85-95%	No cache (re-fetches)
Isolated contexts	0%	0%

Parallel Write Scaling

workers	local_ssd_mb_s	s3_mb_s
1	~140	~50
4	~410	~150
8	~650	~250
16	~800	~300

Throughput scales linearly on local SSD (~140 MB/s per worker).

GIL Interaction

operation	gil_released	parallel_speedup
TileDB I/O	Yes	~3-6x with 4 workers
TileDB decompression	Yes	~3-4x with 4 workers
NumPy array ops	Partially	~1.5-2x
Pure Python	No	~1x

RadiObject + MONAI/TorchIO

RadiObject is a storage layer that complements MONAI and TorchIO transforms:

from radiobject.ml.compat import as_torchio_subject

subject = as_torchio_subject(radi.T1w.iloc[0], radi.seg.iloc[0])
augmented = my_torchio_transform(subject)

The benchmarks compare I/O performance only. Use RadiObject for data loading, then apply MONAI/TorchIO transforms.

Running Benchmarks

# Export AWS credentials for S3 benchmarks (use your own profile)
eval $(aws configure export-credentials --profile <your-profile> --format env)
python benchmarks/run_experiments.py --all

See benchmarks/README.md for details.