Official Python SDK for ScrapeGraph AI API
Project Links
Meta
Author: ScrapeGraph AI
Requires Python: >=3.12
Classifiers
Development Status
- 4 - Beta
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Programming Language
- Python :: 3
- Python :: 3.12
- Python :: 3.13
Topic
- Internet :: WWW/HTTP :: Indexing/Search
- Software Development :: Libraries :: Python Modules
Typing
- Typed
ScrapeGraphAI Python SDK
Official Python SDK for the ScrapeGraphAI API.
Install
pip install scrapegraph-py
# or
uv add scrapegraph-py
Quick Start
from scrapegraph_py import ScrapeGraphAI
# reads SGAI_API_KEY from env, or pass explicitly: ScrapeGraphAI(api_key="...")
sgai = ScrapeGraphAI()
result = sgai.scrape("https://example.com")
if result.status == "success":
print(result.data["results"]["markdown"]["data"])
else:
print(result.error)
Every method returns ApiResult[T] — no exceptions to catch:
@dataclass
class ApiResult(Generic[T]):
status: Literal["success", "error"]
data: T | None
error: str | None
elapsed_ms: int
API
scrape
Scrape a webpage in multiple formats (markdown, html, screenshot, json, etc).
from scrapegraph_py import (
ScrapeGraphAI, FetchConfig,
MarkdownFormatConfig, ScreenshotFormatConfig, JsonFormatConfig
)
sgai = ScrapeGraphAI()
res = sgai.scrape(
"https://example.com",
formats=[
MarkdownFormatConfig(mode="reader"),
ScreenshotFormatConfig(full_page=True, width=1440, height=900),
JsonFormatConfig(prompt="Extract product info"),
],
content_type="text/html", # optional, auto-detected
fetch_config=FetchConfig( # optional
mode="js", # "auto" | "fast" | "js"
stealth=True,
timeout=30000,
wait=2000,
scrolls=3,
headers={"Accept-Language": "en"},
cookies={"session": "abc"},
country="us",
),
)
Formats:
markdown— Clean markdown (modes:normal,reader,prune)html— Raw HTML (modes:normal,reader,prune)links— All links on the pageimages— All image URLssummary— AI-generated summaryjson— Structured extraction with prompt/schemabranding— Brand colors, typography, logosscreenshot— Page screenshot (full_page, width, height, quality)
extract
Extract structured data from a URL, HTML, or markdown using AI.
from scrapegraph_py import ScrapeGraphAI
sgai = ScrapeGraphAI()
res = sgai.extract(
prompt="Extract product names and prices",
url="https://example.com",
schema={"type": "object", "properties": {...}}, # optional
mode="reader", # optional
# Or pass html/markdown directly instead of url
)
search
Search the web and optionally extract structured data.
from scrapegraph_py import ScrapeGraphAI
sgai = ScrapeGraphAI()
res = sgai.search(
"best programming languages 2024",
num_results=5, # 1-20, default 3
format="markdown", # "markdown" | "html"
prompt="Extract key points", # optional, for AI extraction
schema={...}, # optional
time_range="past_week", # optional
location_geo_code="us", # optional
)
crawl
Crawl a website and its linked pages.
from scrapegraph_py import ScrapeGraphAI, MarkdownFormatConfig
sgai = ScrapeGraphAI()
# Start a crawl
start = sgai.crawl.start(
"https://example.com",
formats=[MarkdownFormatConfig()],
max_pages=50,
max_depth=2,
max_links_per_page=10,
include_patterns=["/blog/*"],
exclude_patterns=["/admin/*"],
)
# Check status
status = sgai.crawl.get(start.data["id"])
# Control
sgai.crawl.stop(crawl_id)
sgai.crawl.resume(crawl_id)
sgai.crawl.delete(crawl_id)
monitor
Monitor a webpage for changes on a schedule.
from scrapegraph_py import ScrapeGraphAI, MarkdownFormatConfig
sgai = ScrapeGraphAI()
# Create a monitor
mon = sgai.monitor.create(
"https://example.com",
"0 * * * *", # cron expression
name="Price Monitor",
formats=[MarkdownFormatConfig()],
webhook_url="https://...", # optional
)
# Manage monitors
sgai.monitor.list()
sgai.monitor.get(cron_id)
sgai.monitor.update(cron_id, interval="0 */6 * * *")
sgai.monitor.pause(cron_id)
sgai.monitor.resume(cron_id)
sgai.monitor.delete(cron_id)
history
Fetch request history.
from scrapegraph_py import ScrapeGraphAI
sgai = ScrapeGraphAI()
history = sgai.history.list(
service="scrape", # optional filter
page=1,
limit=20,
)
entry = sgai.history.get("request-id")
credits / health
from scrapegraph_py import ScrapeGraphAI
sgai = ScrapeGraphAI()
credits = sgai.credits()
# { remaining: 1000, used: 500, plan: "pro", jobs: { crawl: {...}, monitor: {...} } }
health = sgai.health()
# { status: "ok", uptime: 12345 }
Async Client
All methods have async equivalents via AsyncScrapeGraphAI:
import asyncio
from scrapegraph_py import AsyncScrapeGraphAI
async def main():
async with AsyncScrapeGraphAI() as sgai:
result = await sgai.scrape("https://example.com")
if result.status == "success":
print(result.data["results"]["markdown"]["data"])
else:
print(result.error)
asyncio.run(main())
Async Extract
async with AsyncScrapeGraphAI() as sgai:
res = await sgai.extract(
prompt="Extract product names and prices",
url="https://example.com",
)
Async Search
async with AsyncScrapeGraphAI() as sgai:
res = await sgai.search("best programming languages 2024", num_results=5)
Async Crawl
async with AsyncScrapeGraphAI() as sgai:
start = await sgai.crawl.start("https://example.com", max_pages=50)
status = await sgai.crawl.get(start.data["id"])
Async Monitor
async with AsyncScrapeGraphAI() as sgai:
mon = await sgai.monitor.create(
"https://example.com",
"0 * * * *",
name="Price Monitor",
)
Examples
Sync Examples
| Service | Example | Description |
|---|---|---|
| scrape | scrape_basic.py |
Basic markdown scraping |
| scrape | scrape_multi_format.py |
Multiple formats |
| scrape | scrape_json_extraction.py |
Structured JSON extraction |
| scrape | scrape_pdf.py |
PDF document parsing |
| scrape | scrape_with_fetchconfig.py |
JS rendering, stealth mode |
| extract | extract_basic.py |
AI data extraction |
| extract | extract_with_schema.py |
Extraction with JSON schema |
| search | search_basic.py |
Web search |
| search | search_with_extraction.py |
Search + AI extraction |
| crawl | crawl_basic.py |
Start and monitor a crawl |
| crawl | crawl_with_formats.py |
Crawl with formats |
| monitor | monitor_basic.py |
Create a page monitor |
| monitor | monitor_with_webhook.py |
Monitor with webhook |
| utilities | credits.py |
Check credits and limits |
| utilities | health.py |
API health check |
| utilities | history.py |
Request history |
Async Examples
| Service | Example | Description |
|---|---|---|
| scrape | scrape_basic_async.py |
Basic markdown scraping |
| scrape | scrape_multi_format_async.py |
Multiple formats |
| scrape | scrape_json_extraction_async.py |
Structured JSON extraction |
| scrape | scrape_pdf_async.py |
PDF document parsing |
| scrape | scrape_with_fetchconfig_async.py |
JS rendering, stealth mode |
| extract | extract_basic_async.py |
AI data extraction |
| extract | extract_with_schema_async.py |
Extraction with JSON schema |
| search | search_basic_async.py |
Web search |
| search | search_with_extraction_async.py |
Search + AI extraction |
| crawl | crawl_basic_async.py |
Start and monitor a crawl |
| crawl | crawl_with_formats_async.py |
Crawl with formats |
| monitor | monitor_basic_async.py |
Create a page monitor |
| monitor | monitor_with_webhook_async.py |
Monitor with webhook |
| utilities | credits_async.py |
Check credits and limits |
| utilities | health_async.py |
API health check |
| utilities | history_async.py |
Request history |
Environment Variables
| Variable | Description | Default |
|---|---|---|
SGAI_API_KEY |
Your ScrapeGraphAI API key | — |
SGAI_API_URL |
Override API base URL | https://v2-api.scrapegraphai.com/api |
SGAI_DEBUG |
Enable debug logging ("1") |
off |
SGAI_TIMEOUT |
Request timeout in seconds | 120 |
Development
uv sync
uv run pytest tests/ # unit tests
uv run pytest tests/test_integration.py # live API tests (requires SGAI_API_KEY)
uv run ruff check . # lint
License
MIT - ScrapeGraphAI
2.1.0
Apr 21, 2026
2.0.1
Apr 21, 2026
1.47.0
Apr 18, 2026
1.46.0
Jan 26, 2026
1.45.0
Jan 23, 2026
1.44.1
Jan 17, 2026
1.44.0
Nov 28, 2025
1.43.0
Nov 26, 2025
1.42.0
Nov 21, 2025
1.41.1
Nov 14, 2025
1.41.0
Nov 04, 2025
1.40.0
Nov 04, 2025
1.39.0
Nov 03, 2025
1.38.0
Oct 23, 2025
1.37.0
Oct 23, 2025
1.36.0
Oct 16, 2025
1.35.0
Oct 15, 2025
1.34.0
Oct 08, 2025
1.33.0
Oct 06, 2025
1.32.0
Oct 06, 2025
1.31.0
Sep 17, 2025
1.30.0
Sep 17, 2025
1.29.0
Sep 16, 2025
1.28.0
Sep 16, 2025
1.27.0
Sep 14, 2025
1.26.0
Sep 11, 2025
1.25.1
Sep 08, 2025
1.25.0
Sep 08, 2025
1.24.0
Sep 03, 2025
1.23.0
Sep 01, 2025
1.22.0
Sep 01, 2025
1.21.0
Sep 01, 2025
1.20.0
Aug 19, 2025
1.19.0
Aug 18, 2025
1.18.2
Aug 06, 2025
1.18.1
Aug 06, 2025
1.18.0
Aug 05, 2025
1.17.0
Jul 30, 2025
1.16.0
Jul 21, 2025
1.15.0
Jul 18, 2025
1.14.2
Jul 12, 2025
1.14.1
Jul 08, 2025
1.14.0
Jul 08, 2025
1.12.2
Jul 08, 2025
1.12.1
Jul 08, 2025
1.12.0
Feb 05, 2025
1.11.0
Feb 03, 2025
1.11.0b1
Feb 03, 2025
1.10.2
Jan 22, 2025
1.10.1
Jan 22, 2025
1.10.0
Jan 16, 2025
1.9.0
Jan 08, 2025
1.9.0b7
Feb 03, 2025
1.9.0b6
Jan 08, 2025
1.9.0b5
Jan 03, 2025
1.9.0b3
Dec 10, 2024
1.9.0b2
Dec 10, 2024
1.9.0b1
Dec 10, 2024
1.8.1
Jul 08, 2025
1.8.0
Dec 08, 2024
1.7.0
Dec 05, 2024
1.7.0b1
Dec 05, 2024
1.6.0
Dec 05, 2024
1.6.0b1
Dec 05, 2024
1.5.0
Dec 04, 2024
1.5.0b1
Dec 05, 2024
1.4.3
Dec 03, 2024
1.4.3b3
Dec 05, 2024
1.4.3b2
Dec 05, 2024
1.4.3b1
Dec 03, 2024
1.4.2
Dec 02, 2024
1.4.1
Dec 02, 2024
1.4.0
Nov 30, 2024
1.3.0
Nov 30, 2024
1.2.2
Nov 29, 2024
1.2.1
Nov 29, 2024
1.2.0
Nov 28, 2024
1.1.0
Nov 28, 2024
1.0.0
Jul 02, 2025
0.0.3
Nov 20, 2024
0.0.2
Nov 20, 2024
0.0.1
Nov 09, 2024