bithuman 1.7.9


pip install bithuman

  Latest version

Released: Apr 02, 2026


Meta
Requires Python: >=3.9

Classifiers

Development Status
  • 5 - Production/Stable

Intended Audience
  • Developers

Topic
  • Multimedia :: Graphics :: 3D Rendering
  • Scientific/Engineering :: Artificial Intelligence

Operating System
  • OS Independent
  • MacOS
  • POSIX :: Linux

Programming Language
  • Python :: 3
  • Python :: 3.9
  • Python :: 3.10
  • Python :: 3.11
  • Python :: 3.12
  • Python :: 3.13
  • Python :: 3.14

bitHuman Avatar Runtime

bitHuman Banner

Real-time avatar engine for visual AI agents, digital humans, and creative characters.

PyPI version Python Platforms

bitHuman powers visual AI agents and conversational AI with photorealistic avatars and real-time lip-sync. Build voice agents with faces, video chatbots, AI assistants, and interactive digital humans — all running on edge devices with just 1-2 CPU cores and <200ms latency. Raw generation speed is 100+ FPS on CPU alone, enabling real-time streaming applications.

Installation

pip install bithuman --upgrade

Pre-built wheels for all major platforms — no compilation required:

Linux macOS Windows
x86_64 yes yes yes
ARM64 yes yes (Apple Silicon)
Python 3.9 — 3.14 3.9 — 3.14 3.9 — 3.14

For LiveKit agent integration:

pip install bithuman[agent]

Quick Start

Generate a lip-synced video

bithuman generate avatar.imx --audio speech.wav --key YOUR_API_KEY

Stream a live avatar to your browser

# Terminal 1: Start the streaming server
bithuman stream avatar.imx --key YOUR_API_KEY

# Terminal 2: Send audio to trigger lip-sync
bithuman speak speech.wav

Open http://localhost:3001 to see the avatar streaming live.

Python API (async)

import asyncio
from bithuman import AsyncBithuman
from bithuman.audio import load_audio, float32_to_int16

async def main():
    runtime = await AsyncBithuman.create(
        model_path="avatar.imx",
        api_secret="YOUR_API_KEY",
    )
    await runtime.start()

    # Load and stream audio
    audio, sr = load_audio("speech.wav")
    audio_int16 = float32_to_int16(audio)

    async def stream_audio():
        chunk_size = sr // 25  # match video FPS
        for i in range(0, len(audio_int16), chunk_size):
            await runtime.push_audio(
                audio_int16[i:i + chunk_size].tobytes(), sr
            )
        await runtime.flush()

    asyncio.create_task(stream_audio())

    # Receive lip-synced video frames
    async for frame in runtime.run():
        if frame.has_image:
            image = frame.bgr_image       # numpy (H, W, 3), uint8
            audio = frame.audio_chunk     # synchronized audio
        if frame.end_of_speech:
            break

    await runtime.stop()

asyncio.run(main())

Python API (sync)

from bithuman import Bithuman
from bithuman.audio import load_audio, float32_to_int16

runtime = Bithuman.create(model_path="avatar.imx", api_secret="YOUR_API_KEY")

audio, sr = load_audio("speech.wav")
audio_int16 = float32_to_int16(audio)

chunk_size = sr // 100
for i in range(0, len(audio_int16), chunk_size):
    runtime.push_audio(audio_int16[i:i+chunk_size].tobytes(), sr)
runtime.flush()

for frame in runtime.run():
    if frame.has_image:
        image = frame.bgr_image
    if frame.end_of_speech:
        break

How It Works

  1. Load model.imx file contains the avatar's appearance, animations, and lip-sync data
  2. Push audio — Stream audio bytes in real-time via push_audio(), call flush() when done
  3. Get frames — Iterate runtime.run() to receive lip-synced video frames with synchronized audio

The runtime handles the full motion graph internally: idle animations, talking with lip-sync, head movements, blinking, and smooth transitions between states.

Performance

Metric Value
Raw FPS 100+ on CPU (Intel i5-12400, Apple M2)
CPU cores 1-2 cores at 25 FPS
End-to-end latency <200ms
Memory (IMX v2) ~200 MB per session
Model load time <10ms (IMX v2)
Audio formats WAV, MP3, FLAC, OGG, M4A

Features

  • Real-time lip-sync — Audio-driven mouth animation at 25 FPS with synchronized audio output
  • Cross-platform — Linux, macOS, Windows; x86_64 and ARM64; Python 3.9-3.14
  • Edge-ready — 1-2 CPU cores, no GPU required for inference
  • Sync + AsyncBithuman for threads, AsyncBithuman for async/await
  • Streaming-first — Push audio chunks in real-time, receive frames as they're generated
  • Actions & emotions — Trigger avatar gestures (wave, nod) and emotion states (joy, surprise)
  • Interrupt support — Cancel mid-speech for natural conversation flow
  • LiveKit integration — Built-in support for LiveKit Agents (WebRTC streaming)
  • CLI tools — Generate videos, stream live, convert models, validate setups
  • IMX v2 format — Optimized binary container with O(1) random access and WebP patches
  • Zero scipy dependency — Pure numpy audio pipeline, minimal install footprint

API Reference

AsyncBithuman / Bithuman

The main runtime for avatar animation.

# Create and initialize
runtime = await AsyncBithuman.create(
    model_path="avatar.imx",     # Path to .imx model
    api_secret="API_KEY",        # API secret (recommended)
    # token="JWT_TOKEN",         # Or JWT token directly
)
await runtime.start()

# Push audio (int16 PCM, any sample rate — auto-resampled to 16kHz)
await runtime.push_audio(audio_bytes, sample_rate)
await runtime.flush()            # Signal end of speech
runtime.interrupt()              # Cancel current playback

# Receive frames
async for frame in runtime.run():
    frame.bgr_image              # np.ndarray (H, W, 3) uint8 BGR
    frame.rgb_image              # np.ndarray (H, W, 3) uint8 RGB
    frame.audio_chunk            # AudioChunk — synchronized audio
    frame.end_of_speech          # True when all audio processed
    frame.has_image              # True if image available
    frame.frame_index            # Frame number
    frame.source_message_id      # Correlates to input

# Controls
await runtime.push(VideoControl(action="wave"))          # Trigger action
await runtime.push(VideoControl(target_video="idle"))    # Switch state
runtime.set_muted(True)                                  # Mute processing

# Info
runtime.get_frame_size()          # (width, height)
runtime.get_first_frame()         # First idle frame as np.ndarray
runtime.get_expiration_time()     # Token expiry (unix timestamp)
runtime.is_token_validated()      # Auth status

await runtime.stop()

Data Classes

from bithuman import AudioChunk, VideoControl, VideoFrame, Emotion, EmotionPrediction

# AudioChunk — container for audio data
chunk = AudioChunk(data=np.array([...], dtype=np.int16), sample_rate=16000)
chunk.duration    # float — length in seconds
chunk.bytes       # bytes — raw PCM bytes

# VideoControl — input to the runtime
ctrl = VideoControl(
    audio=chunk,                    # Audio to lip-sync
    action="wave",                  # Trigger action (wave, nod, etc.)
    target_video="talking",         # Switch video state
    end_of_speech=True,             # Mark end of speech
    force_action=False,             # Override action deduplication
    emotion_preds=[                 # Set emotion state
        EmotionPrediction(emotion=Emotion.JOY, score=0.9),
    ],
)

# VideoFrame — output from runtime.run()
frame.bgr_image           # np.ndarray (H, W, 3) uint8 — BGR
frame.rgb_image           # np.ndarray (H, W, 3) uint8 — RGB
frame.audio_chunk         # AudioChunk — synchronized audio
frame.end_of_speech       # bool — True when done
frame.has_image           # bool — True if image available
frame.frame_index         # int — frame number
frame.source_message_id   # Hashable — correlates to VideoControl

# Emotion enum
Emotion.ANGER | Emotion.DISGUST | Emotion.FEAR | Emotion.JOY
Emotion.NEUTRAL | Emotion.SADNESS | Emotion.SURPRISE

Audio Utilities

from bithuman.audio import (
    load_audio,               # Load WAV/MP3/FLAC/OGG/M4A -> (float32, sr)
    float32_to_int16,         # float32 -> int16
    int16_to_float32,         # int16 -> float32
    resample,                 # Resample to target rate
    write_video_with_audio,   # Save MP4 with audio track
    AudioStreamBatcher,       # Real-time audio buffer
)

audio, sr = load_audio("speech.mp3")             # Any format
audio_int16 = float32_to_int16(audio)            # Ready for push_audio
audio_16k = resample(audio, sr, 16000)           # Resample
write_video_with_audio("out.mp4", frames, audio, sr, fps=25)

Exceptions

All exceptions inherit from BithumanError:

Exception When
TokenExpiredError JWT has expired
TokenValidationError Invalid signature or claims
TokenRequestError Auth server unreachable
AccountStatusError Billing or access issue (HTTP 402/403)
ModelNotFoundError Model file doesn't exist
ModelLoadError Corrupt or incompatible model
ModelSecurityError Security restriction triggered
RuntimeNotReadyError Operation called before initialization

LiveKit Agent Integration

Build conversational AI agents with avatar faces using LiveKit Agents:

from bithuman import AsyncBithuman
from bithuman.utils.agent import LocalAvatarRunner, LocalVideoPlayer, LocalAudioIO

# Initialize bitHuman runtime
runtime = await AsyncBithuman.create(
    model_path="avatar.imx",
    api_secret="YOUR_API_KEY",
)

# Connect to LiveKit agent session
avatar = LocalAvatarRunner(
    bithuman_runtime=runtime,
    audio_input=session.audio,
    audio_output=LocalAudioIO(session, agent_output),
    video_output=LocalVideoPlayer(window_size=(1280, 720)),
)
await avatar.start()

See examples/livekit_agent/ for a complete working example with OpenAI Realtime voice.

Optimize Your Models

Convert existing .imx models to IMX v2 for dramatically better performance:

bithuman convert avatar.imx
Metric Legacy (TAR) IMX v2 Improvement
Model size 100 MB 50-70 MB 30-50% smaller
Load time ~10s <10ms 1000x faster
Runtime speed ~30 FPS 100+ FPS 3-10x faster
Peak memory ~10 GB ~200 MB 98% less

Conversion is automatic on first load, but pre-converting saves startup time.

CLI Reference

Command Description
bithuman generate <model> --audio <file> Generate lip-synced MP4 from model + audio
bithuman stream <model> Start live streaming server at localhost:3001
bithuman speak <audio> Send audio to running stream server
bithuman action <name> Trigger avatar action (wave, nod, etc.)
bithuman info <model> Show model metadata
bithuman list-videos <model> List all videos in a model
bithuman convert <model> Convert legacy to optimized IMX v2
bithuman validate <path> Validate model files load correctly

Configuration

Environment Variables

Variable Description
BITHUMAN_API_SECRET API secret for authentication
BITHUMAN_RUNTIME_TOKEN JWT token (alternative to API secret)
BITHUMAN_VERBOSE Enable debug logging
CONVERT_THREADS Number of threads for model conversion (0 or unset = auto-detect)

Runtime Settings

Setting Default Description
FPS 25 Target frames per second
OUTPUT_WIDTH 1280 Output frame width (0 = native resolution)
PRELOAD_TO_MEMORY False Cache model in RAM for faster decode
PROCESS_IDLE_VIDEO True Run inference during silence (natural idle)

Use Cases

  • Visual AI Agents — Give your voice agents a face with real-time lip-sync
  • Conversational AI — Build video chatbots and AI assistants with human-like presence
  • Live Streaming — Stream avatars to browsers via WebSocket, LiveKit, or WebRTC
  • Video Generation — Generate lip-synced content from audio at 100+ FPS
  • Edge AI — Run locally on Raspberry Pi, Mac Mini, Chromebook, or any edge device
  • Digital Twins — Photorealistic replicas for customer service, education, or entertainment

Examples

Example Description
example.py Async runtime with live video + audio playback
example_sync.py Synchronous runtime with threading
livekit_agent/ LiveKit Agent with OpenAI Realtime voice
livekit_webrtc/ WebRTC streaming server

Troubleshooting

macOS: Duplicate FFmpeg library warnings

objc: Class AVFFrameReceiver is implemented in both .../cv2/.dylibs/libavdevice...
and .../av/.dylibs/libavdevice...

This happens when opencv-python (full) is installed alongside av (PyAV) — both bundle FFmpeg dylibs. Fix by switching to the headless variant:

pip install opencv-python-headless

This replaces opencv-python and removes the duplicate dylibs. The bithuman package already depends on opencv-python-headless, so this only occurs when another package has pulled in the full opencv-python.

Model conversion fails with TypeError

If you see TypeError: an integer is required during conversion, upgrade to the latest version:

pip install bithuman --upgrade

This was fixed in v1.6.2. The issue affected models in legacy TAR format during auto-conversion.

Getting a bitHuman Model

To create your own avatar model (.imx file):

  1. Visit bithuman.ai
  2. Register and subscribe
  3. Upload a photo or video to create your avatar
  4. Download your .imx model file

Links

License

Commercial license required. See bithuman.ai for pricing.

Wheel compatibility matrix

Platform CPython 3.9 CPython 3.10 CPython 3.11 CPython 3.12 CPython 3.13 CPython 3.14
macosx_10_13_x86_64
macosx_10_15_x86_64
macosx_10_9_x86_64
macosx_11_0_arm64
manylinux2014_aarch64
manylinux2014_x86_64
manylinux_2_17_aarch64
manylinux_2_17_x86_64
manylinux_2_28_aarch64
manylinux_2_28_x86_64
win_amd64

Files in release

bithuman-1.7.9-cp310-cp310-macosx_10_9_x86_64.whl (2.1MiB)
bithuman-1.7.9-cp310-cp310-macosx_11_0_arm64.whl (2.1MiB)
bithuman-1.7.9-cp310-cp310-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl (2.5MiB)
bithuman-1.7.9-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl (2.5MiB)
bithuman-1.7.9-cp310-cp310-win_amd64.whl (2.1MiB)
bithuman-1.7.9-cp311-cp311-macosx_10_9_x86_64.whl (2.1MiB)
bithuman-1.7.9-cp311-cp311-macosx_11_0_arm64.whl (2.1MiB)
bithuman-1.7.9-cp311-cp311-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl (2.5MiB)
bithuman-1.7.9-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl (2.5MiB)
bithuman-1.7.9-cp311-cp311-win_amd64.whl (2.1MiB)
bithuman-1.7.9-cp312-cp312-macosx_10_13_x86_64.whl (2.1MiB)
bithuman-1.7.9-cp312-cp312-macosx_11_0_arm64.whl (2.1MiB)
bithuman-1.7.9-cp312-cp312-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl (2.5MiB)
bithuman-1.7.9-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl (2.5MiB)
bithuman-1.7.9-cp312-cp312-win_amd64.whl (2.1MiB)
bithuman-1.7.9-cp313-cp313-macosx_10_13_x86_64.whl (2.1MiB)
bithuman-1.7.9-cp313-cp313-macosx_11_0_arm64.whl (2.1MiB)
bithuman-1.7.9-cp313-cp313-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl (2.5MiB)
bithuman-1.7.9-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl (2.5MiB)
bithuman-1.7.9-cp313-cp313-win_amd64.whl (2.1MiB)
bithuman-1.7.9-cp314-cp314-macosx_10_15_x86_64.whl (2.1MiB)
bithuman-1.7.9-cp314-cp314-macosx_11_0_arm64.whl (2.1MiB)
bithuman-1.7.9-cp314-cp314-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl (2.5MiB)
bithuman-1.7.9-cp314-cp314-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl (2.5MiB)
bithuman-1.7.9-cp314-cp314-win_amd64.whl (2.1MiB)
bithuman-1.7.9-cp39-cp39-macosx_10_9_x86_64.whl (2.1MiB)
bithuman-1.7.9-cp39-cp39-macosx_11_0_arm64.whl (2.1MiB)
bithuman-1.7.9-cp39-cp39-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl (2.5MiB)
bithuman-1.7.9-cp39-cp39-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl (2.5MiB)
bithuman-1.7.9-cp39-cp39-win_amd64.whl (2.1MiB)
Extras:
Dependencies:
numpy (>=1.26.0)
h5py (~=3.13)
loguru (~=0.7)
soxr (>=0.5)
soundfile (~=0.13)
pydantic (~=2.10)
pydantic-settings (~=2.8)
networkx (<4.0,>=3.1)
pyzmq (~=26.2)
msgpack (~=1.1)
PyYAML (~=6.0)
aiohttp (~=3.11)
onnxruntime (>=1.18)
onnxruntime (>=1.14)
eval_type_backport (>=0.1.1)
av (>=12.0)
PyJWT (>=2.8)
requests (>=2.31)
lz4 (>=4.3)
PyTurboJPEG (>=1.7)
Pillow (>=9.0)
opencv-python-headless (>=4.8)
tqdm (>=4.60)