The official Python client for Ollama.
Project Links
Meta
Requires Python: >=3.8
Classifiers
Ollama Python Library
The Ollama Python library provides the easiest way to integrate Python 3.8+ projects with Ollama.
Prerequisites
- Ollama should be installed and running
- Pull a model to use with the library:
ollama pull <model>e.g.ollama pull gemma3- See Ollama.com for more information on the models available.
Install
pip install ollama
Usage
from ollama import chat
from ollama import ChatResponse
response: ChatResponse = chat(model='gemma3', messages=[
{
'role': 'user',
'content': 'Why is the sky blue?',
},
])
print(response['message']['content'])
# or access fields directly from the response object
print(response.message.content)
See _types.py for more information on the response types.
Streaming responses
Response streaming can be enabled by setting stream=True.
from ollama import chat
stream = chat(
model='gemma3',
messages=[{'role': 'user', 'content': 'Why is the sky blue?'}],
stream=True,
)
for chunk in stream:
print(chunk['message']['content'], end='', flush=True)
Custom client
A custom client can be created by instantiating Client or AsyncClient from ollama.
All extra keyword arguments are passed into the httpx.Client.
from ollama import Client
client = Client(
host='http://localhost:11434',
headers={'x-some-header': 'some-value'}
)
response = client.chat(model='gemma3', messages=[
{
'role': 'user',
'content': 'Why is the sky blue?',
},
])
Async client
The AsyncClient class is used to make asynchronous requests. It can be configured with the same fields as the Client class.
import asyncio
from ollama import AsyncClient
async def chat():
message = {'role': 'user', 'content': 'Why is the sky blue?'}
response = await AsyncClient().chat(model='gemma3', messages=[message])
asyncio.run(chat())
Setting stream=True modifies functions to return a Python asynchronous generator:
import asyncio
from ollama import AsyncClient
async def chat():
message = {'role': 'user', 'content': 'Why is the sky blue?'}
async for part in await AsyncClient().chat(model='gemma3', messages=[message], stream=True):
print(part['message']['content'], end='', flush=True)
asyncio.run(chat())
API
The Ollama Python library's API is designed around the Ollama REST API
Chat
ollama.chat(model='gemma3', messages=[{'role': 'user', 'content': 'Why is the sky blue?'}])
Generate
ollama.generate(model='gemma3', prompt='Why is the sky blue?')
List
ollama.list()
Show
ollama.show('gemma3')
Create
ollama.create(model='example', from_='gemma3', system="You are Mario from Super Mario Bros.")
Copy
ollama.copy('gemma3', 'user/gemma3')
Delete
ollama.delete('gemma3')
Pull
ollama.pull('gemma3')
Push
ollama.push('user/gemma3')
Embed
ollama.embed(model='gemma3', input='The sky is blue because of rayleigh scattering')
Embed (batch)
ollama.embed(model='gemma3', input=['The sky is blue because of rayleigh scattering', 'Grass is green because of chlorophyll'])
Ps
ollama.ps()
Errors
Errors are raised if requests return an error status or if an error is detected while streaming.
model = 'does-not-yet-exist'
try:
ollama.chat(model)
except ollama.ResponseError as e:
print('Error:', e.error)
if e.status_code == 404:
ollama.pull(model)
0.6.1
Nov 13, 2025
0.6.0
Sep 24, 2025
0.5.4
Sep 16, 2025
0.5.3
Aug 07, 2025
0.5.2
Aug 05, 2025
0.5.1
May 30, 2025
0.5.0
May 30, 2025
0.4.9
May 27, 2025
0.4.8
Apr 16, 2025
0.4.7
Jan 21, 2025
0.4.6
Jan 14, 2025
0.4.5
Dec 29, 2024
0.4.4
Dec 08, 2024
0.4.3
Dec 06, 2024
0.4.2
Nov 28, 2024
0.4.1
Nov 24, 2024
0.4.0
Nov 21, 2024
0.3.3
Sep 09, 2024
0.3.2
Aug 27, 2024
0.3.1
Jul 29, 2024
0.3.0
Jul 18, 2024
0.2.1
Jun 05, 2024
0.2.0
May 10, 2024
0.1.9
Apr 26, 2024
0.1.8
Mar 27, 2024
0.1.7
Mar 01, 2024
0.1.6
Feb 02, 2024
0.1.5
Jan 30, 2024
0.1.4
Jan 23, 2024
0.1.3
Jan 19, 2024
0.1.2
Jan 16, 2024
0.1.0
Jan 12, 2024
0.0.1
Jan 12, 2024
0.0.0
Jan 12, 2024