openreward 0.1.125


pip install openreward

  Latest version

Released: May 21, 2026

Project Links

Meta
Author: GR Inc
Requires Python: >=3.11

Classifiers

OpenReward Python SDK

PyPI version Python 3.11+ Docs

The official Python SDK for OpenReward — a platform for building, hosting, and training on RL environments for language models.

The SDK has two complementary roles:

  • Build environments — define evaluation tasks, expose tools, and serve them via a standards-compliant API that can be deployed on the OpenReward platform.
  • Train agents — connect to any environment (local or hosted), run agent loops, and log rollouts with rewards back to OpenReward.

Installation

pip install openreward

For environments that process documents (PDF, DOCX, Excel, PowerPoint):

pip install "openreward[tools]"

Requires Python 3.11+.

Core concepts

Environment

An Environment subclass defines a benchmark or task distribution. Implement three required methods:

Method Purpose
list_splits() Return split names, e.g. ["train", "test"]
list_tasks(split) Return a deterministically ordered list of task dicts
get_prompt() Return the task instructions as a list of TextBlock / ImageBlock

Actions are defined as async methods decorated with @tool. Each tool receives a Pydantic model as input and returns a ToolOutput.

ToolOutput

Every tool returns a ToolOutput containing:

  • blocks — a list of TextBlock or ImageBlock results
  • reward — optional float reward signal
  • finished — whether the episode is complete
  • metadata — optional arbitrary metadata

Server

Server wraps one or more Environment classes in a FastAPI app and exposes the Open Reward Standard API over HTTP with SSE streaming.

Key endpoints:

Endpoint Description
POST /create Spawn a new environment session
POST /{env}/call Execute a tool (streamed via SSE)
GET /{env}/prompt Get the current task prompt
GET /{env}/tools List available tools
POST /{env}/tasks List all tasks for a split

Sandboxes

Environments that need isolated compute (e.g. code execution) can spin up Docker containers via the sandbox API using SandboxSettings. Containers are managed automatically — started in setup() and torn down in teardown().

Toolsets

Group reusable tools into Toolset classes and compose them across environments via the toolsets class attribute.

Rollout logging

Log agent trajectories with reward signals back to OpenReward for analysis and training. The client's rollout API supports normalized message types as well as raw outputs from Anthropic, OpenAI, and Google GenAI SDKs.

CLI

The orwd CLI helps you scaffold and create environments.

Scaffold a new environment locally

# Minimal environment
orwd init my-env

# Environment with a Docker sandbox for code execution
orwd init my-env --template sandbox

Create an environment on OpenReward

Registers a new environment under your account (requires OPENREWARD_API_KEY):

orwd create my-env --description "A short description of my environment"

By default the environment is created under your personal namespace. To create it under an organisation you are a member of, pass --namespace:

orwd create my-env --description "A short description" --namespace my-org

Pass --private to make the environment private:

orwd create my-env --description "A short description" --private

Deploying to OpenReward

  1. Push your environment to a GitHub repository.
  2. Connect the repository in the OpenReward dashboard.
  3. Configure compute resources (CPU, memory, scaling).
  4. Every push to the connected branch triggers an automatic build and deployment.

Your environment is then accessible to any agent via the OpenReward API using the username/environment-name namespace.

Environment variables

Variable Description
OPENREWARD_API_KEY API key for authentication
OPENREWARD_URL Override base URL (default: https://openreward.ai)
OPENREWARD_USE_STRUCTURED_LOGS Set to 1 for JSON logging (recommended in production)
OPENREWARD_ROLLOUT_LOGGING_FORMAT pretty or structured for rollout log output

Documentation

Full documentation, guides, and examples are at docs.openreward.ai.

License

Apache 2.0

0.1.125 May 21, 2026
0.1.124 May 20, 2026
0.1.123 May 19, 2026
0.1.123.dev1 May 19, 2026
0.1.122 May 19, 2026
0.1.122.dev1 May 19, 2026
0.1.121 May 18, 2026
0.1.121.dev0 May 14, 2026
0.1.120 May 11, 2026
0.1.119 May 11, 2026
0.1.118 May 10, 2026
0.1.117 May 10, 2026
0.1.116 May 10, 2026
0.1.115 May 08, 2026
0.1.115.dev1 May 08, 2026
0.1.114 May 08, 2026
0.1.114.dev1 May 08, 2026
0.1.113 May 08, 2026
0.1.112 May 05, 2026
0.1.111 May 05, 2026
0.1.110 May 05, 2026
0.1.109 Apr 28, 2026
0.1.108 Apr 28, 2026
0.1.107 Apr 28, 2026
0.1.106 Apr 24, 2026
0.1.105 Apr 23, 2026
0.1.104 Apr 22, 2026
0.1.103 Apr 22, 2026
0.1.102 Apr 22, 2026
0.1.101 Apr 22, 2026
0.1.101.dev2 Apr 22, 2026
0.1.100 Apr 22, 2026
0.1.99 Apr 21, 2026
0.1.98 Apr 20, 2026
0.1.97 Apr 15, 2026
0.1.96 Apr 15, 2026
0.1.96.dev2 Apr 15, 2026
0.1.96.dev1 Apr 14, 2026
0.1.96.dev0 Apr 14, 2026
0.1.95 Apr 14, 2026
0.1.95.dev0 Apr 14, 2026
0.1.94 Apr 12, 2026
0.1.93 Apr 10, 2026
0.1.93.dev0 Apr 09, 2026
0.1.92 Apr 08, 2026
0.1.91 Apr 07, 2026
0.1.90 Apr 03, 2026
0.1.89 Apr 01, 2026
0.1.89.dev1 Apr 03, 2026
0.1.88 Mar 31, 2026
0.1.87 Mar 30, 2026
0.1.86 Mar 23, 2026
0.1.85 Mar 23, 2026
0.1.84 Mar 23, 2026
0.1.83 Mar 22, 2026
0.1.82 Mar 22, 2026
0.1.81 Mar 20, 2026
0.1.80 Mar 20, 2026
0.1.79 Mar 19, 2026
0.1.78 Mar 19, 2026
0.1.77 Mar 19, 2026
0.1.76 Mar 19, 2026
0.1.75 Mar 18, 2026
0.1.74 Mar 18, 2026
0.1.73 Mar 18, 2026
0.1.72 Mar 18, 2026
0.1.71 Mar 17, 2026
0.1.70 Mar 17, 2026
0.1.69 Mar 17, 2026
0.1.68 Mar 17, 2026
0.1.67 Mar 17, 2026
0.1.66 Mar 17, 2026
0.1.65 Mar 17, 2026
0.1.64 Mar 16, 2026
0.1.63 Mar 16, 2026
0.1.62 Mar 16, 2026
0.1.61 Mar 16, 2026
0.1.60 Mar 13, 2026
0.1.59 Mar 12, 2026
0.1.58 Mar 12, 2026
0.1.57 Mar 12, 2026
0.1.56 Mar 11, 2026
0.1.56.dev0 Mar 11, 2026
0.1.55 Mar 11, 2026
0.1.54 Mar 11, 2026
0.1.53 Mar 11, 2026
0.1.53.dev0 Mar 11, 2026
0.1.52 Mar 11, 2026
0.1.51 Mar 11, 2026
0.1.50 Mar 09, 2026
0.1.49 Mar 07, 2026
0.1.48 Mar 06, 2026
0.1.47 Mar 06, 2026
0.1.46 Mar 05, 2026
0.1.45 Mar 05, 2026
0.1.44 Mar 05, 2026
0.1.43 Mar 05, 2026
0.1.42 Mar 05, 2026
0.1.41 Mar 05, 2026
0.1.40 Mar 05, 2026
0.1.39 Mar 05, 2026
0.1.38 Mar 04, 2026
0.1.37 Mar 04, 2026
0.1.36 Mar 03, 2026
0.1.35 Mar 03, 2026
0.1.34 Mar 03, 2026
0.1.33 Mar 02, 2026
0.1.32 Mar 01, 2026
0.1.31 Feb 28, 2026
0.1.30 Feb 27, 2026
0.1.29 Feb 26, 2026
0.1.28 Feb 26, 2026
0.1.27 Feb 23, 2026
0.1.26 Feb 20, 2026
0.1.25 Feb 19, 2026
0.1.22 Feb 02, 2026
0.1.21 Feb 01, 2026
0.1.19 Jan 21, 2026
0.1.18 Jan 18, 2026
0.1.17 Jan 18, 2026
0.1.16 Jan 17, 2026
0.1.14 Jan 16, 2026
0.1.13 Jan 15, 2026
0.1.11 Jan 11, 2026
0.1.10 Jan 10, 2026
0.1.9 Jan 09, 2026
0.1.7 Jan 01, 2026
0.1.6 Dec 19, 2025
0.1.5 Dec 18, 2025
0.1.4 Dec 18, 2025
0.1.3 Dec 08, 2025
0.1.2 Dec 08, 2025
0.1.1 Dec 08, 2025
0.1.0 Dec 08, 2025
0.0.1 Aug 11, 2025

Wheel compatibility matrix

Platform Python 3
any

Files in release