Development Status
- 5 - Production/Stable
Intended Audience
- Developers
- Science/Research
License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python
- Python :: 3 :: Only
- Python :: 3.10
- Python :: 3.11
- Python :: 3.12
- Python :: 3.13
- Python :: 3.14
Topic
- Scientific/Engineering :: Artificial Intelligence
- Scientific/Engineering :: Image Processing
- Software Development :: Libraries
- Software Development :: Libraries :: Python Modules
Typing
- Typed
Albucore: High-Performance Image Processing Functions
Albucore is a library of optimized atomic functions designed for efficient image processing. These functions serve as the foundation for AlbumentationsX, a popular image augmentation library.
Overview
Image processing operations can be implemented in various ways, each with its own performance characteristics depending on the image type, size, and number of channels. Albucore aims to provide the fastest implementation for each operation by leveraging different backends such as NumPy, OpenCV, and custom optimized code.
Supported dtypes: uint8 and float32 only.
Key features:
- Optimized atomic image processing functions
- Automatic selection of the fastest implementation based on input image characteristics
- Seamless integration with AlbumentationsX
- Benchmark scripts available (see
./benchmark.sh <data_dir>)
Installation
Requires Python 3.10+. Basic installation (you manage OpenCV separately):
pip install albucore
With OpenCV headless (recommended for servers/CI):
pip install albucore[headless]
With OpenCV GUI support (for local development with cv2.imshow):
pip install albucore[gui]
With OpenCV contrib modules:
pip install albucore[contrib] # GUI version
pip install albucore[contrib-headless] # Headless version
Note: If you already have opencv-python or opencv-contrib-python installed, just use pip install albucore to avoid package conflicts. Albucore will detect and use your existing OpenCV installation.
Usage
import numpy as np
import albucore
# Create a sample RGB image
image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
# Apply a function
result = albucore.multiply(image, 1.5)
# For grayscale images, ensure the channel dimension is present
gray_image = np.random.randint(0, 256, (100, 100, 1), dtype=np.uint8)
gray_result = albucore.multiply(gray_image, 1.5)
Albucore automatically selects the most efficient implementation based on the input image type and characteristics.
Shape Conventions
Albucore expects images to follow specific shape conventions, with the channel dimension always present:
- Single image:
(H, W, C)- Height, Width, Channels - Grayscale image:
(H, W, 1)- Height, Width, 1 channel - Batch of images:
(N, H, W, C)- Number of images, Height, Width, Channels - 3D volume:
(D, H, W, C)- Depth, Height, Width, Channels - Batch of volumes:
(N, D, H, W, C)- Number of volumes, Depth, Height, Width, Channels
Important Notes:
- Channel dimension is always required, even for grayscale images (use shape
(H, W, 1)) - Single-channel images should have shape
(H, W, 1)not(H, W) - Batch vs volume:
(N, H, W, C)is N separate images; a single 3D volume is(D, H, W, C)with depthD. Do not confuseN(batch) withD(slices).
Examples:
import numpy as np
import albucore
# Grayscale image - MUST have explicit channel dimension
gray_image = np.random.randint(0, 256, (100, 100, 1), dtype=np.uint8)
# RGB image
rgb_image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
# Batch of 10 grayscale images
batch_gray = np.random.randint(0, 256, (10, 100, 100, 1), dtype=np.uint8)
# 3D volume with 20 slices
volume = np.random.randint(0, 256, (20, 100, 100, 1), dtype=np.uint8)
# Batch of 5 RGB volumes, each with 20 slices
batch_volumes = np.random.randint(0, 256, (5, 20, 100, 100, 3), dtype=np.uint8)
Functions
All symbols below are exported via from albucore import * (or from albucore.functions import …).
Every function accepts uint8 and float32 images; inputs must have an explicit channel dimension
((H, W, C) — never bare (H, W)).
Arithmetic
| Function | Signature | What it does | How it works |
|---|---|---|---|
multiply |
(img, value, inplace=False) |
img * value, clipped to dtype range |
uint8 scalar/vector → LUT; uint8 array → OpenCV; float32 → NumPy broadcast |
add |
(img, value, inplace=False) |
img + value, clipped to dtype range |
uint8 scalar → OpenCV saturate; uint8 vector → LUT; uint8 array → NumKong/OpenCV; float32 → NumPy |
power |
(img, exponent, inplace=False) |
img ** exponent, clipped to dtype range |
uint8 → LUT; float32 scalar → cv2.pow; float32 array → NumPy |
add_weighted |
(img1, weight1, img2, weight2) |
img1*w1 + img2*w2, clipped |
NumKong SIMD blend; large float32 → cv2.addWeighted |
multiply_add |
(img, factor, value, inplace=False) |
img * factor + value, clipped |
uint8 → LUT (fused, one table); float32 → NumPy broadcast |
value / factor / exponent can be a scalar, a length-C 1-D array (per-channel), or a
full image-shaped array.
Normalization
| Function | Signature | What it does | How it works |
|---|---|---|---|
normalize |
(img, mean, denominator) |
(img - mean) * denominator → float32 |
uint8 → LUT (256-entry float32 table per channel); float32 → NumPy fused. Caller-supplied constants (e.g. ImageNet stats). |
normalize_per_image |
(img, normalization) |
Normalize using stats computed from img → float32 |
uint8 → LUT (except "min_max" → cv2.normalize); float32 → OpenCV/NumPy. normalization ∈ {"image", "image_per_channel", "min_max", "min_max_per_channel"} |
normalize is for fixed per-channel constants (ImageNet-style).
normalize_per_image estimates stats from the image at call time.
Statistics
| Function | Signature | What it does | How it works |
|---|---|---|---|
mean |
(arr, axis=None, *, keepdims=False, dtype=None) |
Population mean | uint8 global → NumKong moments; per-channel HWC ≤ 4ch → cv2.mean; else NumPy |
std |
(arr, axis=None, *, keepdims=False, eps=1e-4, dtype=None) |
Population std + eps | uint8 global → NumKong moments; per-channel HWC ≤ 4ch → cv2.meanStdDev; else NumPy |
mean_std |
(arr, axis=None, *, keepdims=False, eps=1e-4) |
Mean and std+eps jointly | Single NumKong moments pass for uint8 global (faster than separate mean+std) |
reduce_sum |
(arr, axis=None, *, keepdims=False) |
Sum with wide accumulator | uint8 → NumKong (avoids uint8 overflow); float32 → np.sum(dtype=float64) |
axis accepts None/"global" (scalar), "per_channel" (shape (C,)), or any NumPy-style
int/tuple[int, ...].
LUT (lookup tables)
| Function | Signature | What it does | How it works |
|---|---|---|---|
apply_uint8_lut |
(img, lut, *, inplace=False) |
Apply uint8→uint8 LUT; lut shape (256,) or (C, 256) |
Shared (256,): StringZilla or cv2.LUT by size heuristic. Per-channel (C, 256): single cv2.LUT with (256,1,C) table on contiguous HWC; else StringZilla per channel |
sz_lut |
(img, lut, inplace=True) |
Apply shared (256,) uint8 LUT via StringZilla translate |
Raw byte translation — channel-unaware, fastest for small images and single-channel |
Geometric / spatial
| Function | Signature | What it does | How it works |
|---|---|---|---|
hflip |
(img) |
Mirror left-right | cv2.flip(img, 1); chunked for >512 channels |
vflip |
(img) |
Mirror top-bottom | cv2.flip(img, 0) for ≤4 channels; NumPy slice for >4 channels |
median_blur |
(img, ksize) |
Median filter (odd ksize ≥ 3) | cv2.medianBlur; ksize ≥ 7 falls back to chunk processing for >4 channels. Accepts float32 via @uint8_io |
matmul |
(a, b) |
Matrix multiply (a @ b) |
NumPy @ (BLAS-backed); replaces cv2.gemm which lacks uint8 support |
pairwise_distances_squared |
(points1, points2) |
Squared Euclidean distance matrix (N, M) |
Small (N*M < 1000) → NumKong cdist; large → NumPy vectorized ‖a‖²+‖b‖²−2(a·b) |
Type conversion
| Function | Signature | What it does | How it works |
|---|---|---|---|
to_float |
(img, max_value=None) |
Convert to float32 in [0, 1] | float32 → no-op; uint8 → cv2.LUT (256-entry float32 table); others → NumPy divide |
from_float |
(img, target_dtype, max_value=None) |
Scale float32 → integer dtype (round + clip) | float32 → NumPy rint(img * max_value) then clip; non-float32 → generic NumPy path |
Decorators (re-exported)
| Decorator | What it does |
|---|---|
float32_io |
Wrap a function: cast input to float32, cast output back to original dtype |
uint8_io |
Wrap a function: cast input to uint8, cast output back to original dtype |
See docs/decorators.md for @preserve_channel_dim, @contiguous,
@clipped, and @batch_transform (used internally, not re-exported).
Batch Processing
Many functions in Albucore support batch processing out of the box. The library automatically handles different input shapes:
- Single images:
(H, W, C) - Batches:
(N, H, W, C) - Volumes:
(D, H, W, C) - Batch of volumes:
(N, D, H, W, C)
Functions will preserve the input shape structure, applying operations efficiently across all images/slices in the batch.
See docs/decorators.md for internal decorator documentation (@preserve_channel_dim, @contiguous, @clipped, @batch_transform).
Performance
Albucore uses a combination of techniques to achieve high performance:
- Multiple Implementations: Each function may have several implementations using different backends (NumPy, OpenCV, custom code).
- Automatic Selection: The library automatically chooses the fastest implementation based on the input image type, size, and number of channels.
- Optimized Algorithms: Custom implementations are optimized for specific use cases, often outperforming general-purpose libraries.
- NumKong: SIMD
blendforadd_weighted;cdistfor smallpairwise_distances_squared; wide-accumulatormomentsfor uint8 global mean/std instats;scalefor float32multiply_by_constant(see docs/numkong-performance.md).
Micro-benchmarks vs NumPy/OpenCV: see benchmarks/README.md; quick run: uv run python benchmarks/benchmark_numkong.py (OpenCV rows are skipped if cv2 is not installed).
See docs/performance-optimization.md for detailed performance guidelines and best practices.
Documentation
- CLAUDE.md - AI development guidelines for working with this codebase
- docs/image-conventions.md - Image shape conventions and requirements
- docs/decorators.md - Decorator usage and patterns
- docs/performance-optimization.md - Performance optimization guidelines
- docs/numkong-performance.md - NumKong vs OpenCV/NumPy/LUT baselines (benchmark tables; sum/mean/std)
- docs/public-api.md - Star-exported routers vs
albucore.functionsshims - benchmarks/README.md - Python micro-benchmarks (
uv run python benchmarks/…) - docs/research/ - Research notes (extra benchmark writeups; see
benchmarks/README.mdfor scripts)
License
MIT
Acknowledgements
Albucore is part of the AlbumentationsX project. We'd like to thank all contributors to AlbumentationsX and the broader computer vision community for their inspiration and support.