bithuman 1.19.1


pip install bithuman

  Latest version

Released: May 18, 2026


Meta
Author: bitHuman
Requires Python: >=3.9

Classifiers

Development Status
  • 4 - Beta

Intended Audience
  • Developers

Operating System
  • MacOS
  • MacOS :: MacOS X

Programming Language
  • Python
  • Python :: 3
  • Python :: 3.9
  • Python :: 3.10
  • Python :: 3.11
  • Python :: 3.12
  • Python :: 3.13
  • Python :: 3.14
  • C++

Topic
  • Multimedia
  • Multimedia :: Graphics
  • Multimedia :: Sound/Audio
  • Multimedia :: Video

bithuman

This is the Python flavor of Layer 3: a platform-specific library for app developers. It wraps the Layer 1 libessence engine. For the CLI tool see docs/CLI.md.

┌─────────────────────────────────────────────────────────────┐
│ Layer 3: Platform-specific libraries (app developers)       │
│   - Python wheel       pip install bithuman    ◄──── you are here
│   - Swift package      SwiftPM Bithuman                     │
│   - Kotlin AAR         ai.bithuman:sdk                      │
│   - (future) Rust crate, JS/TS, Go, ...                     │
└─────────────────────────────────────────────────────────────┘
                          ▼ embeds
┌─────────────────────────────────────────────────────────────┐
│ Layer 2: bithuman CLI (end-user tool)                       │
│   - one cross-platform binary on macOS / Linux / Windows    │
│   - brew install bithuman · curl-pipe installer             │
└─────────────────────────────────────────────────────────────┘
                          ▼ links
┌─────────────────────────────────────────────────────────────┐
│ Layer 1: libessence engine (cross-platform C++ core)        │
│   - portable C ABI, same source on every target             │
│   - macOS · iOS · Android · Linux · Windows                 │
│   - never imported directly by app developers               │
└─────────────────────────────────────────────────────────────┘

Python bindings for the bitHuman SDK — the portable C++ avatar engine (libessence) that powers our cross-platform lipsync pipeline. The wheel ships a native pybind11 module that talks directly to libessence, so you get the same per-frame cost as our Swift and Kotlin clients with none of the GIL noise.

On an Apple M5 with 24 GB unified memory we measure ~640 FPS sustained compose (1.56 ms/frame mean, 2.03 ms p99) for a 1248×704 avatar, with ~206 MB peak RSS end-to-end. Cold load is ~14 ms for the fixture and ~400 ms for the first compose tick (lazy ONNX init).

This package is namespace-isolated from the v0 bithuman SDK; you can install both side-by-side.

Install

pip install bithuman

Status — Python wheel lags the rest of the SDK. The PyPI bithuman wheel is v1.12.4 (ABI v4), built from the legacy python/ tree at the root of bithuman-sdk. The new ABI-v6 streaming surface (be_runtime_push_audio / be_runtime_pull_frame / …) is C-level only in this binding tree until the Rust PyO3 wheel at cpp/bindings/rust/crates/bithuman-py ships to PyPI as the canonical replacement. Today's PyPI users keep the legacy AsyncBithuman.push_audio + async for ... in runtime.run() shape — see the legacy quickstart.

Compatibility

  • Platforms: macOS arm64, Linux x86_64, Linux arm64 — all ship as wheels. Windows is tracked for a follow-up.
  • Python: 3.10 – 3.13 (cp310, cp311, cp312, cp313). CPython only.
  • ABI: the published wheel wraps libessence ABI v4. The libessence engine itself is on ABI v6 — that surface is currently exposed via the Swift / Kotlin / Rust bindings only. PyO3 wheel migration in flight.
  • Auth: ships with live heartbeat against api.bithuman.ai baked into libessence. Avatar.load(api_secret=...) is the entry point; BITHUMAN_API_SECRET env var works too. Set BITHUMAN_UNMETERED=1 for dev / parity-test runs.

What you get

The package exposes three API tiers (all importable from bithuman):

Tier Types Use when…
Async AsyncAvatar, AudioChunk, VideoControl, VideoFrame Hosting a service / parity with legacy AsyncBithuman
Sync facade Avatar, ComposedFrame, EP Offline / batch / CLI rendering
Low-level Fixture, Runtime, EP_CPU/EP_AUTO/EP_COREML/EP_NNAPI/EP_QNN Direct C ABI access, custom audio pipeline

Error types: BithumanError (base), TokenError / TokenExpiredError / TokenValidationError / TokenRequestError / AccountStatusError (auth), ModelError / ModelNotFoundError / ModelLoadError / ModelSecurityError / ExpressionModelNotSupported (fixture), RuntimeNotReadyError.

Version info: bithuman.__version__ (Python package), bithuman.__core_version__ (linked libessence), bithuman.__abi_version__.

Quickstart (legacy AsyncBithuman — PyPI)

This is the shape of the current published wheel. It ports directly from the v0 bithuman SDK: feed PCM with push_audio, drain frames from the runtime.run() async generator.

import asyncio
from bithuman import AsyncBithuman

async def main():
    runtime = await AsyncBithuman(
        model_path="model.imx",
        api_secret="...",  # or BITHUMAN_API_SECRET env var
    ).start()

    await runtime.push_audio(pcm_16k_mono_int16_bytes,
                             sample_rate=16000, last_chunk=True)

    async for frame in runtime.run():
        # frame.bgr_image is (H, W, 3) uint8 in BGR order
        ...

asyncio.run(main())

PCM accepted is int16 little-endian bytes at 16 kHz mono. WAV / MP3 / FLAC / OGG decoding is the caller's responsibility (use soundfile).

Quickstart (low-level, C-level streaming surface)

The Rust PyO3 wheel will expose the ABI-v6 streaming pair (runtime.push_audio + runtime.pull_frame) on the same shape as the Swift / Kotlin bindings. Until it ships to PyPI, the snippet below uses the legacy Fixture / Runtime types in the published wheel.

CLI

A essence-render console script ships with the wheel:

pip install 'bithuman[cli]'

essence-render \
  --model ~/.cache/bithuman/models/sample-avatar.imx \
  --audio speech.wav \
  --output out.mp4

Pass --output - to stream raw BGR24 frames to stdout (handy for piping into a separate ffmpeg pipeline or a custom encoder). Other flags:

Flag Default Description
--fps 25 Output FPS for the MP4 container.
--quality 80 libx264 quality 1..100 (higher = better).
--ep cpu Execution provider hint (cpu/auto/coreml/…).
--threads 1 ORT intra-op thread count.
--no-audio Skip audio muxing; produce a silent video.

Example end-to-end run (5 s sine sweep):

essence-render 0.1.0: model=sample-avatar.imx audio=sine_sweep_5s.wav ep=cpu threads=1
essence-render: loaded fixture in 14.9 ms — 1248x704 @ 25 fps, 183 clusters, 202 src frames
essence-render: composed 122 frames in 1.83s (14.96 ms/frame, 66.8 fps)
essence-render: wrote /tmp/sine_sweep_5s.mp4

(Throughput here is bounded by H.264 encode, not Essence inference. Use --output - if you want to measure raw compose speed.)

Low-level API

If you need finer control or want to swap in a custom audio pipeline, the C ABI is exposed directly:

import numpy as np
from bithuman import Fixture, Runtime, EP_CPU

fx = Fixture("model.imx", preferred_ep=EP_CPU, intra_op_threads=1)
rt = Runtime(fx)
pcm = np.fromfile("speech.f32", dtype=np.float32)  # 16 kHz mono float32
cluster_idx, bgr = rt.tick_compose(pcm, frame_idx_hint=-1)
# bgr.shape == (fx.frame_height, fx.frame_width, 3), dtype uint8

Pass the entire pcm buffer to each tick_compose call; the runtime maintains an internal cursor and advances one tick per call until the audio is exhausted.

Zero-alloc hot path (since 1.12.4)

For tight render loops, pre-allocate the BGR buffer once and pass it via out=. The runtime writes into it in place and returns just the cluster_idx. This drops wrapper overhead to within ~3 % of raw libessence (vs ~8 % for the alloc-per-tick path):

out = np.empty((fx.frame_height, fx.frame_width, 3), dtype=np.uint8)
for _ in range(num_ticks):
    cluster_idx = rt.tick_compose(pcm, -1, out=out)
    # `out` now holds this tick's frame; read it before the next call.

The same out= keyword works on tick_compose_to_size. See docs/ARCHITECTURE.md §9 for the cross-wrapper perf table.

Build from source

You need the prebuilt parent C++ archive at cpp/build/libessence.a (run the parent CMake build first), plus the runtime deps from Homebrew (onnxruntime, webp, ffmpeg, hdf5, jpeg-turbo).

cd cpp/bindings/python
uv pip install -e '.[cli,test]' --no-build-isolation

The CMake glue links the prebuilt static archive directly — it does NOT re-run the parent build, so iterate on bindings without paying the C++ rebuild cost.

Performance

Measured with tests/bench.py against the v1 compose path (audio → composited BGR frame) on Apple M5 24 GB, libessence 1.16.0:

Metric Alloc per tick out= reuse buffer
Steady-state mean 1.53 ms / frame 1.45 ms / frame
p99 1.66 ms 1.53 ms
Sustained throughput 655 FPS 692 FPS
Overhead vs raw libessence +8.3 % +2.6 %
Peak RSS (proc) 192 MB 182 MB

Wrapper overhead is within 5 % of raw libessence on the out= path; see docs/ARCHITECTURE.md §9 for the apples-to-apples methodology and the cross-wrapper comparison. Reproduce with:

scripts/bench-wrappers.sh

Linux wheels

Pre-built manylinux_2_28 wheels ship for x86_64 + aarch64 across cp310 through cp313 — 8 wheels in total, all auditwheel-repaired with the full dep tree bundled (ORT, FFmpeg, HDF5, libjpeg-turbo, libwebp, libcurl, OpenSSL).

To rebuild them locally:

# One-time: build the dep-baked Docker images (~10 min each).
docker build --platform linux/amd64 -t libessence/manylinux-x86_64:0.1 \
    -f scripts/Dockerfile.manylinux-x86_64 scripts/
docker build --platform linux/arm64/v8 -t libessence/manylinux-aarch64:0.1 \
    -f scripts/Dockerfile.manylinux-aarch64 scripts/

# Per wheel build (~2 min):
docker run --rm --platform linux/amd64 -v "$REPO":/src \
    -e PYTAG=cp311 -e ARCH_INSIDE=x86_64 \
    libessence/manylinux-x86_64:0.1 \
    bash /src/cpp/bindings/python/scripts/build-wheel-in-container.sh

Limitations

  • Windows wheels not yet built — tracked for v0.2.
  • The CLI's output framerate is fixed at 25 fps to match the model's internal rate. Pass --output - and pipe to your own encoder if you need temporal resampling.
  • preferred_ep=COREML/NNAPI/QNN is accepted but currently no-ops to CPU in the v0.1 build.

License

Commercial. Contact hello@bithuman.ai.

See also

Extras:
Dependencies:
numpy (>=1.26.0)
h5py (~=3.13)
loguru (~=0.7)
soxr (>=0.5)
soundfile (~=0.13)
pydantic (~=2.10)
pydantic-settings (~=2.8)
networkx (<4.0,>=3.1)
pyzmq (~=26.2)
msgpack (~=1.1)
PyYAML (~=6.0)
aiohttp (~=3.11)
onnxruntime (>=1.18)
onnxruntime (>=1.14)
onnx (>=1.15)
safetensors (>=0.4)
eval_type_backport (>=0.1.1)
av (>=12.0)
PyJWT (>=2.8)
requests (>=2.31)
lz4 (>=4.3)
PyTurboJPEG (>=1.7)
Pillow (>=9.0)
opencv-python-headless (>=4.8)
tqdm (>=4.60)