🐱 catsu

A unified, batteries-included client for embedding APIs that actually works.

The world of embedding API clients is broken.

Everyone defaults to OpenAI's client for embeddings, even though it wasn't designed for that purpose
Provider-specific libraries (VoyageAI, Cohere, etc.) are inconsistent, poorly maintained, or outright broken
Universal clients like LiteLLM don't focus on embeddings—they rely on native client libraries, inheriting all their problems
Every provider has different capabilities—some support dimension changes, others don't—with no standardized way to discover what's available
Most clients lack basic features like retry logic, proper error handling, and usage tracking

Catsu fixes this. It's a high-performance, unified client built specifically for embeddings with:

🎯 A clean, consistent API across all providers
🔄 Built-in retry logic with exponential backoff
💰 Automatic usage and cost tracking
📚 Rich model metadata and capability discovery
⚡ Rust core with Python bindings for maximum performance

Installation

pip install catsu

Quick Start

from catsu import Client

# Create client (reads API keys from environment)
client = Client()

# Generate embeddings
response = client.embed(
    "openai:text-embedding-3-small",
    ["Hello, world!", "How are you?"]
)

print(f"Dimensions: {response.dimensions}")
print(f"Tokens used: {response.usage.tokens}")
print(f"Embedding: {response.embeddings[0][:5]}")

Async Support

import asyncio
from catsu import Client

async def main():
    client = Client()
    response = await client.aembed(
        "openai:text-embedding-3-small",
        "Hello, async world!"
    )
    print(response.embeddings[0][:5])

asyncio.run(main())

With Options

response = client.embed(
    "openai:text-embedding-3-small",
    ["Search query"],
    input_type="query",  # "query" or "document"
    dimensions=256,      # output dimensions (if supported)
)

Model Catalog

# List all available models
models = client.list_models()

# Filter by provider
openai_models = client.list_models("openai")
for m in openai_models:
    print(f"{m.name}: {m.dimensions} dims, ${m.cost_per_million_tokens}/M tokens")

Configuration

client = Client(
    max_retries=5,   # Default: 3
    timeout=60,      # Default: 30 seconds
)

NumPy Integration

# Convert embeddings to numpy array
arr = response.to_numpy()
print(arr.shape)  # (2, 1536)

Context Manager

# Sync
with Client() as client:
    response = client.embed("openai:text-embedding-3-small", "Hello!")

# Async
async with Client() as client:
    response = await client.aembed("openai:text-embedding-3-small", "Hello!")