Agent Framework plugin for voice synthesis and speech-to-text with Inworld's API.
Project Links
Meta
Author: LiveKit
Requires Python: >=3.10.0
Classifiers
Intended Audience
- Developers
License
- OSI Approved :: Apache Software License
Programming Language
- Python :: 3
- Python :: 3 :: Only
- Python :: 3.10
Topic
- Multimedia :: Sound/Audio
- Multimedia :: Video
- Scientific/Engineering :: Artificial Intelligence
Inworld plugin for LiveKit Agents
Support for voice synthesis and speech-to-text with Inworld TTS and Inworld STT.
See Inworld TTS and Inworld STT for more information.
Installation
pip install livekit-plugins-inworld
Authentication
Set INWORLD_API_KEY in your .env file (get one here).
Usage
TTS
Use Inworld TTS within an AgentSession or as a standalone speech generator.
from livekit.plugins import inworld
tts = inworld.TTS()
Or with options:
from livekit.plugins import inworld
tts = inworld.TTS(
voice="Hades", # voice ID (default or custom cloned voice)
model="inworld-tts-1", # or "inworld-tts-1-max"
encoding="OGG_OPUS", # LINEAR16, MP3, OGG_OPUS, ALAW, MULAW, FLAC
sample_rate=48000, # 8000-48000 Hz
bit_rate=64000, # bits per second (for compressed formats)
speaking_rate=1.0, # 0.5-1.5
temperature=1.1, # 0-2
timestamp_type="WORD", # WORD, CHARACTER, or TIMESTAMP_TYPE_UNSPECIFIED
text_normalization="OFF", # ON, OFF, or APPLY_TEXT_NORMALIZATION_UNSPECIFIED
)
TTS Streaming
Inworld TTS supports WebSocket streaming for lower latency real-time synthesis. Use the
stream() method for streaming text as it's generated:
from livekit.plugins import inworld
tts = inworld.TTS(
voice="Hades",
model="inworld-tts-1",
buffer_char_threshold=100, # chars before triggering synthesis (default: 100)
max_buffer_delay_ms=3000, # max buffer time in ms (default: 3000)
)
# Create a stream for real-time synthesis
stream = tts.stream()
# Push text incrementally
stream.push_text("Hello, ")
stream.push_text("how are you today?")
stream.flush() # Flush any remaining buffered text
stream.end_input() # Signal end of input
# Consume audio as it's generated
async for audio in stream:
# Process audio frames
pass
STT
Use Inworld STT for streaming speech-to-text. Multiple models are supported.
from livekit.plugins import inworld
session = AgentSession(
stt=inworld.STT()
# ... llm, tts, etc.
)
With a specific model and voice profile detection:
from livekit.plugins import inworld
session = AgentSession(
stt=inworld.STT(
model="inworld/inworld-stt-1",
enable_voice_profile=True,
)
# ... llm, tts, etc.
)
Example
A full voice agent using Inworld for both STT and TTS:
"""Inworld STT + TTS voice agent example.
Demonstrates using Inworld for both speech-to-text and text-to-speech
in a LiveKit voice agent. Save this as ``inworld_agent.py`` and run:
uv run inworld_agent.py console # local console mode
uv run inworld_agent.py dev # LiveKit Cloud (requires LIVEKIT_URL,
# LIVEKIT_API_KEY, LIVEKIT_API_SECRET)
Then connect via https://agents-playground.livekit.io
"""
import logging
from dotenv import load_dotenv
from livekit.agents import (
Agent,
AgentServer,
AgentSession,
JobContext,
JobProcess,
cli,
metrics,
room_io,
)
from livekit.plugins import inworld, silero
from livekit.plugins.turn_detector.multilingual import MultilingualModel
logger = logging.getLogger("inworld-agent")
load_dotenv()
class InworldAgent(Agent):
def __init__(self) -> None:
super().__init__(
instructions=(
"Your name is Nova. You interact with users via voice. "
"Keep your responses concise and to the point. "
"Do not use emojis, asterisks, markdown, or other special characters. "
"You are helpful, curious, and friendly."
),
)
async def on_enter(self):
self.session.generate_reply()
server = AgentServer()
def prewarm(proc: JobProcess):
proc.userdata["vad"] = silero.VAD.load()
server.setup_fnc = prewarm
@server.rtc_session()
async def entrypoint(ctx: JobContext):
ctx.log_context_fields = {"room": ctx.room.name}
session = AgentSession(
stt=inworld.STT(model="inworld/inworld-stt-1"),
llm="openai/gpt-4.1-mini",
tts=inworld.TTS(voice="Clive"),
turn_detection=MultilingualModel(),
vad=ctx.proc.userdata["vad"],
)
usage_collector = metrics.UsageCollector()
@session.on("metrics_collected")
def _on_metrics(ev):
metrics.log_metrics(ev.metrics)
usage_collector.collect(ev.metrics)
async def log_usage():
logger.info(f"Usage: {usage_collector.get_summary()}")
ctx.add_shutdown_callback(log_usage)
await session.start(
agent=InworldAgent(),
room=ctx.room,
room_options=room_io.RoomOptions(),
)
if __name__ == "__main__":
cli.run_app(server)
Combined TTS + STT
from livekit.plugins import inworld
session = AgentSession(
tts=inworld.TTS(voice="Hades"),
stt=inworld.STT(),
# ... llm, etc.
)
1.5.12
May 21, 2026
1.5.11
May 19, 2026
1.5.10
May 18, 2026
1.5.9
May 13, 2026
1.5.8
May 05, 2026
1.5.7
Apr 30, 2026
1.5.6
Apr 22, 2026
1.5.5
Apr 20, 2026
1.5.4
Apr 16, 2026
1.5.3
Apr 15, 2026
1.5.2
Apr 08, 2026
1.5.1
Mar 23, 2026
1.5.0
Mar 19, 2026
1.5.0rc2
Mar 06, 2026
1.5.0rc1
Feb 13, 2026
1.4.6
Mar 16, 2026
1.4.5
Mar 11, 2026
1.4.4
Mar 03, 2026
1.4.3
Feb 23, 2026
1.4.2
Feb 17, 2026
1.4.1
Feb 06, 2026
1.4.0rc2
Jan 23, 2026
1.4.0rc1
Dec 23, 2025
1.3.12
Jan 21, 2026
1.3.11
Jan 14, 2026
1.3.10
Dec 23, 2025
1.3.9
Dec 19, 2025
1.3.8
Dec 17, 2025
1.3.7
Dec 16, 2025
1.3.6
Dec 03, 2025
1.3.5
Nov 25, 2025
1.3.4
Nov 24, 2025
1.3.3
Nov 19, 2025
1.3.2
Nov 17, 2025
1.3.1
Nov 17, 2025
1.3.0rc2
Nov 15, 2025
1.3.0rc1
Nov 06, 2025
1.2.18
Nov 05, 2025
1.2.17
Oct 29, 2025
1.2.16
Oct 27, 2025
1.2.15
Oct 15, 2025
1.2.14
Oct 01, 2025
1.2.13
Oct 01, 2025
1.2.12
Sep 29, 2025
1.2.11
Sep 18, 2025
1.2.9
Sep 15, 2025
1.2.8
Sep 02, 2025
1.2.7
Aug 28, 2025
1.2.6
Aug 18, 2025
1.2.5
Aug 10, 2025
1.2.4
Aug 07, 2025
1.2.3
Aug 04, 2025
1.2.2
Jul 24, 2025
1.2.1
Jul 17, 2025
1.2.0
Jul 17, 2025
1.1.7
Jul 15, 2025
1.1.6
Jul 10, 2025
1.1.5
Jun 30, 2025
1.1.4
Jun 25, 2025
Wheel compatibility matrix
Files in release
Extras:
None
Dependencies:
livekit-agents
(>=1.5.12)