spider-client 0.1.77


pip install spider-client

  Latest version

Released: Aug 29, 2025

Project Links

Meta
Author: Spider

Classifiers

Development Status
  • 5 - Production/Stable

Intended Audience
  • Developers
  • Information Technology

Topic
  • Software Development :: Libraries :: Python Modules
  • Internet
  • Internet :: WWW/HTTP
  • Internet :: WWW/HTTP :: Indexing/Search

Operating System
  • OS Independent

Spider Cloud Python SDK

The Spider Cloud Python SDK offers a toolkit for straightforward website scraping, crawling at scale, and other utilities like extracting links and taking screenshots, enabling you to collect data formatted for compatibility with language models (LLMs). It features a user-friendly interface for seamless integration with the Spider Cloud API.

Installation

To install the Spider Cloud Python SDK, you can use pip:

pip install spider_client

Usage

  1. Get an API key from spider.cloud
  2. Set the API key as an environment variable named SPIDER_API_KEY or pass it as a parameter to the Spider class.

Here's an example of how to use the SDK:

from spider import Spider

# Initialize the Spider with your API key
app = Spider(api_key='your_api_key')

# Scrape a single URL
url = 'https://spider.cloud'
scraped_data = app.scrape_url(url)

# Crawl a website
crawler_params = {
    'limit': 1,
    'proxy_enabled': True,
    'metadata': False,
    'request': 'http'
}
crawl_result = app.crawl_url(url, params=crawler_params)

Scraping a URL

To scrape data from a single URL:

url = 'https://example.com'
scraped_data = app.scrape_url(url)

Crawling a Website

To automate crawling a website:

url = 'https://example.com'
crawl_params = {
    'limit': 200,
    'request': 'smart_mode'
}
crawl_result = app.crawl_url(url, params=crawl_params)

Crawl Streaming

Stream crawl the website in chunks to scale.

    def handle_json(json_obj: dict) -> None:
        assert json_obj["url"] is not None

    url = 'https://example.com'
    crawl_params = {
        'limit': 200,
    }
    response = app.crawl_url(
        url,
        params=params,
        stream=True,
        callback=handle_json,
    )

Search

Perform a search for websites to crawl or gather search results:

query = 'a sports website'
crawl_params = {
    'request': 'smart_mode',
    'search_limit': 5,
    'limit': 5,
    'fetch_page_content': True
}
crawl_result = app.search(query, params=crawl_params)

Retrieving Links from a URL(s)

Extract all links from a specified URL:

url = 'https://example.com'
links = app.links(url)

Transform

Transform HTML to markdown or text lightning fast:

data = [ { 'html': '<html><body><h1>Hello world</h1></body></html>' } ]
params = {
    'readability': False,
    'return_format': 'markdown',
}
result = app.transform(data, params=params)

Taking Screenshots of a URL(s)

Capture a screenshot of a given URL:

url = 'https://example.com'
screenshot = app.screenshot(url)

Extracting Contact Information

Extract contact details from a specified URL:

url = 'https://example.com'
contacts = app.extract_contacts(url)

Checking Crawl State

You can check the crawl state of the website:

url = 'https://example.com'
state = app.get_crawl_state(url)

Downloading files

You can download the results of the website:

url = 'https://example.com'
params = {
    'page': 0,
    'limit': 100,
    'expiresIn': 3600  # Optional, add if needed
}
stream = True

state = app.create_signed_url(url, params, stream)

Checking Available Credits

You can check the remaining credits on your account:

credits = app.get_credits()

Data Operations

The Spider client can now interact with specific data tables to create, retrieve, and delete data.

Retrieve Data from a Table

To fetch data from a specified table by applying query parameters:

table_name = 'pages'
query_params = {'limit': 20 }
response = app.data_get(table_name, query_params)
print(response)

Delete Data from a Table

To delete data from a specified table based on certain conditions:

table_name = 'websites'
delete_params = {'domain': 'www.example.com'}
response = app.data_delete(table_name, delete_params)
print(response)

Streaming

If you need to stream the request use the third param:

url = 'https://example.com'

crawler_params = {
    'limit': 1,
    'proxy_enabled': True,
    'metadata': False,
    'request': 'http'
}

links = app.links(url, crawler_params, True)

Content-Type

The following Content-type headers are supported using the fourth param:

  1. application/json
  2. text/csv
  3. application/xml
  4. application/jsonl
url = 'https://example.com'

crawler_params = {
    'limit': 1,
    'proxy_enabled': True,
    'metadata': False,
    'request': 'http'
}

# stream json lines back to the client
links = app.crawl(url, crawler_params, True, "application/jsonl")

Error Handling

The SDK handles errors returned by the Spider Cloud API and raises appropriate exceptions. If an error occurs during a request, an exception will be raised with a descriptive error message.

Contributing

Contributions to the Spider Cloud Python SDK are welcome! If you find any issues or have suggestions for improvements, please open an issue or submit a pull request on the GitHub repository.

License

The Spider Cloud Python SDK is open-source and released under the MIT License.

0.1.77 Aug 29, 2025
0.1.75 Aug 21, 2025
0.1.74 Aug 20, 2025
0.1.73 Aug 20, 2025
0.1.72 Aug 14, 2025
0.1.71 Aug 03, 2025
0.1.70 Aug 03, 2025
0.1.69 Jul 20, 2025
0.1.68 Jul 20, 2025
0.1.67 Jul 20, 2025
0.1.65 Jul 17, 2025
0.1.63 Jul 14, 2025
0.1.62 Jul 14, 2025
0.1.61 Jul 14, 2025
0.1.60 Jul 14, 2025
0.1.59 Jul 14, 2025
0.1.58 Jul 14, 2025
0.1.57 Jul 14, 2025
0.1.56 Jul 12, 2025
0.1.55 Jul 12, 2025
0.1.54 Jul 12, 2025
0.1.53 Jul 10, 2025
0.1.52 Jul 10, 2025
0.1.38 Jun 19, 2025
0.1.37 Jun 08, 2025
0.1.36 May 18, 2025
0.1.35 May 18, 2025
0.1.34 May 05, 2025
0.1.33 May 05, 2025
0.1.32 Mar 26, 2025
0.1.31 Mar 25, 2025
0.1.30 Mar 25, 2025
0.1.28 Mar 12, 2025
0.1.27 Mar 10, 2025
0.1.26 Jan 21, 2025
0.1.25 Dec 12, 2024
0.1.24 Dec 05, 2024
0.1.23 Nov 07, 2024
0.1.22 Nov 04, 2024
0.0.72 Oct 11, 2024
0.0.71 Oct 09, 2024
0.0.70 Sep 14, 2024
0.0.69 Aug 27, 2024
0.0.68 Aug 20, 2024
0.0.67 Aug 09, 2024
0.0.66 Aug 08, 2024
0.0.65 Aug 03, 2024
0.0.64 Jul 24, 2024
0.0.63 Jul 24, 2024
0.0.62 Jul 23, 2024
0.0.61 Jul 23, 2024
0.0.60 Jul 23, 2024
0.0.59 Jul 21, 2024
0.0.58 Jul 14, 2024
0.0.57 Jul 14, 2024
0.0.56 Jul 13, 2024
0.0.55 Jul 13, 2024
0.0.54 Jul 13, 2024
0.0.53 Jul 09, 2024
0.0.52 Jul 07, 2024
0.0.51 Jul 06, 2024
0.0.50 Jul 06, 2024
0.0.49 Jul 06, 2024
0.0.48 Jul 04, 2024
0.0.47 Jul 04, 2024
0.0.46 Jul 04, 2024
0.0.45 Jul 04, 2024
0.0.44 Jul 04, 2024
0.0.43 Jul 02, 2024
0.0.42 Jul 02, 2024
0.0.41 Jul 02, 2024
0.0.40 Jul 02, 2024
0.0.39 Jul 01, 2024
0.0.38 Jul 01, 2024
0.0.37 Jul 01, 2024
0.0.36 Jun 30, 2024
0.0.35 Jun 30, 2024
0.0.34 Jun 30, 2024
0.0.33 Jun 30, 2024
0.0.32 Jun 30, 2024
0.0.31 Jun 30, 2024
0.0.30 Jun 30, 2024
0.0.29 Jun 30, 2024
0.0.28 Jun 30, 2024
0.0.27 Jun 18, 2024
0.0.26 Jun 18, 2024
0.0.25 Jun 15, 2024
0.0.24 Jun 07, 2024
0.0.23 Jun 07, 2024
0.0.22 May 27, 2024
0.0.21 May 13, 2024
0.0.20 May 13, 2024
0.0.11 Apr 29, 2024
0.0.10 Apr 26, 2024
0.0.9 Apr 25, 2024
0.0.8 Apr 25, 2024
0.0.7 Apr 25, 2024
0.0.6 Apr 23, 2024
0.0.5 Apr 23, 2024
0.0.4 Apr 23, 2024
0.0.3 Apr 22, 2024
0.0.2 Apr 22, 2024
0.0.1 Apr 22, 2024

Wheel compatibility matrix

Platform Python 3
any

Files in release

Extras: None
Dependencies:
requests
ijson
tenacity
aiohttp