Development Status
- 4 - Beta
Intended Audience
- Developers
Operating System
- MacOS
- MacOS :: MacOS X
Programming Language
- Python
- Python :: 3
- Python :: 3.9
- Python :: 3.10
- Python :: 3.11
- Python :: 3.12
- Python :: 3.13
- Python :: 3.14
- C++
Topic
- Multimedia
- Multimedia :: Graphics
- Multimedia :: Sound/Audio
- Multimedia :: Video
bithuman
This is the Python flavor of Layer 3: a platform-specific library for app developers. It wraps the Layer 1 libessence engine. For the CLI tool see docs/CLI.md.
┌─────────────────────────────────────────────────────────────┐
│ Layer 3: Platform-specific libraries (app developers) │
│ - Python wheel pip install bithuman ◄──── you are here
│ - Swift package SwiftPM Bithuman │
│ - Kotlin AAR ai.bithuman:sdk │
│ - (future) Rust crate, JS/TS, Go, ... │
└─────────────────────────────────────────────────────────────┘
▼ embeds
┌─────────────────────────────────────────────────────────────┐
│ Layer 2: bithuman CLI (end-user tool) │
│ - one cross-platform binary on macOS / Linux / Windows │
│ - brew install bithuman · curl-pipe installer │
└─────────────────────────────────────────────────────────────┘
▼ links
┌─────────────────────────────────────────────────────────────┐
│ Layer 1: libessence engine (cross-platform C++ core) │
│ - portable C ABI, same source on every target │
│ - macOS · iOS · Android · Linux · Windows │
│ - never imported directly by app developers │
└─────────────────────────────────────────────────────────────┘
Python bindings for the bitHuman SDK — the portable C++ avatar engine
(libessence) that powers our cross-platform lipsync pipeline. The wheel
ships a native pybind11 module that talks directly to libessence,
so you get the same per-frame cost as our Swift and Kotlin clients with
none of the GIL noise.
On an Apple M5 with 24 GB unified memory we measure ~640 FPS sustained compose (1.56 ms/frame mean, 2.03 ms p99) for a 1248×704 avatar, with ~206 MB peak RSS end-to-end. Cold load is ~14 ms for the fixture and ~400 ms for the first compose tick (lazy ONNX init).
This package is namespace-isolated from the v0 bithuman SDK; you can
install both side-by-side.
Install
pip install bithuman
Status — Python wheel lags the rest of the SDK. The PyPI
bithumanwheel is v1.12.4 (ABI v4), built from the legacypython/tree at the root ofbithuman-sdk. The new ABI-v6 streaming surface (be_runtime_push_audio/be_runtime_pull_frame/ …) is C-level only in this binding tree until the Rust PyO3 wheel atcpp/bindings/rust/crates/bithuman-pyships to PyPI as the canonical replacement. Today's PyPI users keep the legacyAsyncBithuman.push_audio+async for ... in runtime.run()shape — see the legacy quickstart.
Compatibility
- Platforms: macOS arm64, Linux x86_64, Linux arm64 — all ship as wheels. Windows is tracked for a follow-up.
- Python: 3.10 – 3.13 (cp310, cp311, cp312, cp313). CPython only.
- ABI: the published wheel wraps
libessenceABI v4. The libessence engine itself is on ABI v6 — that surface is currently exposed via the Swift / Kotlin / Rust bindings only. PyO3 wheel migration in flight. - Auth: ships with live heartbeat against
api.bithuman.aibaked intolibessence.Avatar.load(api_secret=...)is the entry point;BITHUMAN_API_SECRETenv var works too. SetBITHUMAN_UNMETERED=1for dev / parity-test runs.
What you get
The package exposes three API tiers (all importable from bithuman):
| Tier | Types | Use when… |
|---|---|---|
| Async | AsyncAvatar, AudioChunk, VideoControl, VideoFrame |
Hosting a service / parity with legacy AsyncBithuman |
| Sync facade | Avatar, ComposedFrame, EP |
Offline / batch / CLI rendering |
| Low-level | Fixture, Runtime, EP_CPU/EP_AUTO/EP_COREML/EP_NNAPI/EP_QNN |
Direct C ABI access, custom audio pipeline |
Error types: BithumanError (base), TokenError /
TokenExpiredError / TokenValidationError / TokenRequestError /
AccountStatusError (auth), ModelError / ModelNotFoundError /
ModelLoadError / ModelSecurityError / ExpressionModelNotSupported
(fixture), RuntimeNotReadyError.
Version info: bithuman.__version__ (Python package),
bithuman.__core_version__ (linked libessence), bithuman.__abi_version__.
Quickstart (legacy AsyncBithuman — PyPI)
This is the shape of the current published wheel. It ports directly from
the v0 bithuman SDK: feed PCM with push_audio, drain frames from the
runtime.run() async generator.
import asyncio
from bithuman import AsyncBithuman
async def main():
runtime = await AsyncBithuman(
model_path="model.imx",
api_secret="...", # or BITHUMAN_API_SECRET env var
).start()
await runtime.push_audio(pcm_16k_mono_int16_bytes,
sample_rate=16000, last_chunk=True)
async for frame in runtime.run():
# frame.bgr_image is (H, W, 3) uint8 in BGR order
...
asyncio.run(main())
PCM accepted is int16 little-endian bytes at 16 kHz mono. WAV / MP3 /
FLAC / OGG decoding is the caller's responsibility (use soundfile).
Quickstart (low-level, C-level streaming surface)
The Rust PyO3 wheel will expose the ABI-v6 streaming pair
(runtime.push_audio + runtime.pull_frame) on the same shape as the
Swift / Kotlin bindings. Until it ships to PyPI, the snippet below uses
the legacy Fixture / Runtime types in the published wheel.
CLI
A essence-render console script ships with the wheel:
pip install 'bithuman[cli]'
essence-render \
--model ~/.cache/bithuman/models/sample-avatar.imx \
--audio speech.wav \
--output out.mp4
Pass --output - to stream raw BGR24 frames to stdout (handy for piping
into a separate ffmpeg pipeline or a custom encoder). Other flags:
| Flag | Default | Description |
|---|---|---|
--fps |
25 | Output FPS for the MP4 container. |
--quality |
80 | libx264 quality 1..100 (higher = better). |
--ep |
cpu |
Execution provider hint (cpu/auto/coreml/…). |
--threads |
1 | ORT intra-op thread count. |
--no-audio |
– | Skip audio muxing; produce a silent video. |
Example end-to-end run (5 s sine sweep):
essence-render 0.1.0: model=sample-avatar.imx audio=sine_sweep_5s.wav ep=cpu threads=1
essence-render: loaded fixture in 14.9 ms — 1248x704 @ 25 fps, 183 clusters, 202 src frames
essence-render: composed 122 frames in 1.83s (14.96 ms/frame, 66.8 fps)
essence-render: wrote /tmp/sine_sweep_5s.mp4
(Throughput here is bounded by H.264 encode, not Essence inference. Use
--output - if you want to measure raw compose speed.)
Low-level API
If you need finer control or want to swap in a custom audio pipeline, the C ABI is exposed directly:
import numpy as np
from bithuman import Fixture, Runtime, EP_CPU
fx = Fixture("model.imx", preferred_ep=EP_CPU, intra_op_threads=1)
rt = Runtime(fx)
pcm = np.fromfile("speech.f32", dtype=np.float32) # 16 kHz mono float32
cluster_idx, bgr = rt.tick_compose(pcm, frame_idx_hint=-1)
# bgr.shape == (fx.frame_height, fx.frame_width, 3), dtype uint8
Pass the entire pcm buffer to each tick_compose call; the runtime
maintains an internal cursor and advances one tick per call until the
audio is exhausted.
Zero-alloc hot path (since 1.12.4)
For tight render loops, pre-allocate the BGR buffer once and pass it
via out=. The runtime writes into it in place and returns just the
cluster_idx. This drops wrapper overhead to within ~3 % of raw
libessence (vs ~8 % for the alloc-per-tick path):
out = np.empty((fx.frame_height, fx.frame_width, 3), dtype=np.uint8)
for _ in range(num_ticks):
cluster_idx = rt.tick_compose(pcm, -1, out=out)
# `out` now holds this tick's frame; read it before the next call.
The same out= keyword works on tick_compose_to_size. See
docs/ARCHITECTURE.md §9 for the cross-wrapper perf table.
Build from source
You need the prebuilt parent C++ archive at
cpp/build/libessence.a (run the parent CMake build first), plus
the runtime deps from Homebrew (onnxruntime, webp, ffmpeg,
hdf5, jpeg-turbo).
cd cpp/bindings/python
uv pip install -e '.[cli,test]' --no-build-isolation
The CMake glue links the prebuilt static archive directly — it does NOT re-run the parent build, so iterate on bindings without paying the C++ rebuild cost.
Performance
Measured with tests/bench.py against the v1 compose path
(audio → composited BGR frame) on Apple M5 24 GB, libessence 1.16.0:
| Metric | Alloc per tick | out= reuse buffer |
|---|---|---|
| Steady-state mean | 1.53 ms / frame | 1.45 ms / frame |
| p99 | 1.66 ms | 1.53 ms |
| Sustained throughput | 655 FPS | 692 FPS |
| Overhead vs raw libessence | +8.3 % | +2.6 % |
| Peak RSS (proc) | 192 MB | 182 MB |
Wrapper overhead is within 5 % of raw libessence on the out= path;
see docs/ARCHITECTURE.md §9 for the apples-to-apples methodology and
the cross-wrapper comparison. Reproduce with:
scripts/bench-wrappers.sh
Linux wheels
Pre-built manylinux_2_28 wheels ship for x86_64 + aarch64 across cp310
through cp313 — 8 wheels in total, all auditwheel-repaired with the
full dep tree bundled (ORT, FFmpeg, HDF5, libjpeg-turbo, libwebp,
libcurl, OpenSSL).
To rebuild them locally:
# One-time: build the dep-baked Docker images (~10 min each).
docker build --platform linux/amd64 -t libessence/manylinux-x86_64:0.1 \
-f scripts/Dockerfile.manylinux-x86_64 scripts/
docker build --platform linux/arm64/v8 -t libessence/manylinux-aarch64:0.1 \
-f scripts/Dockerfile.manylinux-aarch64 scripts/
# Per wheel build (~2 min):
docker run --rm --platform linux/amd64 -v "$REPO":/src \
-e PYTAG=cp311 -e ARCH_INSIDE=x86_64 \
libessence/manylinux-x86_64:0.1 \
bash /src/cpp/bindings/python/scripts/build-wheel-in-container.sh
Limitations
- Windows wheels not yet built — tracked for v0.2.
- The CLI's output framerate is fixed at 25 fps to match the model's
internal rate. Pass
--output -and pipe to your own encoder if you need temporal resampling. preferred_ep=COREML/NNAPI/QNNis accepted but currently no-ops to CPU in the v0.1 build.
License
Commercial. Contact hello@bithuman.ai.
See also
- Root
README.md— install matrix cpp/README.md— libessence engine internals + C ABIdocs/CLI.md—bithumanCLI referencecpp/bindings/swift/README.md— Swift bindingcpp/bindings/kotlin/README.md— Kotlin/Android bindingdocs/BUILD_AND_RELEASE.md— release flow
Wheel compatibility matrix
| Platform | CPython 3.9 | CPython 3.10 | CPython 3.11 | CPython 3.12 | CPython 3.13 | CPython 3.14 |
|---|---|---|---|---|---|---|
| macosx_26_0_arm64 | ||||||
| manylinux_2_28_aarch64 | ||||||
| manylinux_2_28_x86_64 |