protego 0.6.0


pip install protego

  Latest version

Released: Jan 29, 2026


Meta
Author: Anubhav Patel
Requires Python: >=3.10

Classifiers

Development Status
  • 4 - Beta

Intended Audience
  • Developers

Operating System
  • OS Independent

Programming Language
  • Python
  • Python :: 3
  • Python :: 3.10
  • Python :: 3.11
  • Python :: 3.12
  • Python :: 3.13
  • Python :: 3.14
  • Python :: Implementation :: CPython
  • Python :: Implementation :: PyPy

Topic
  • Internet :: WWW/HTTP
  • Software Development :: Libraries :: Python Modules
Supported Python Versions CI

Protego is a pure-Python robots.txt parser with support for modern conventions.

Install

To install Protego, simply use pip:

pip install protego

Usage

>>> from protego import Protego
>>> robotstxt = """
... User-agent: *
... Disallow: /
... Allow: /about
... Allow: /account
... Disallow: /account/contact$
... Disallow: /account/*/profile
... Crawl-delay: 4
... Request-rate: 10/1m                 # 10 requests every 1 minute
...
... Sitemap: http://example.com/sitemap-index.xml
... Host: http://example.co.in
... """
>>> rp = Protego.parse(robotstxt)
>>> rp.can_fetch("http://example.com/profiles", "mybot")
False
>>> rp.can_fetch("http://example.com/about", "mybot")
True
>>> rp.can_fetch("http://example.com/account", "mybot")
True
>>> rp.can_fetch("http://example.com/account/myuser/profile", "mybot")
False
>>> rp.can_fetch("http://example.com/account/contact", "mybot")
False
>>> rp.crawl_delay("mybot")
4.0
>>> rp.request_rate("mybot")
RequestRate(requests=10, seconds=60, start_time=None, end_time=None)
>>> list(rp.sitemaps)
['http://example.com/sitemap-index.xml']
>>> rp.preferred_host
'http://example.co.in'

Using Protego with Requests:

>>> from protego import Protego
>>> import requests
>>> r = requests.get("https://google.com/robots.txt")
>>> rp = Protego.parse(r.text)
>>> rp.can_fetch("https://google.com/search", "mybot")
False
>>> rp.can_fetch("https://google.com/search/about", "mybot")
True
>>> list(rp.sitemaps)
['https://www.google.com/sitemap.xml']

Comparison

The following table compares Protego to the most popular robots.txt parsers implemented in Python or featuring Python bindings:

Protego

RobotFileParser

Reppy

Robotexclusionrulesparser

Implementation language

Python

Python

C++

Python

Reference specification

Google

Martijn Koster’s 1996 draft

Wildcard support

Length-based precedence

Performance

+40%

+1300%

-25%

API Reference

Class protego.Protego:

Properties

  • sitemaps {list_iterator} A list of sitemaps specified in robots.txt.

  • preferred_host {string} Preferred host specified in robots.txt.

Methods

  • parse(robotstxt_body) Parse robots.txt and return a new instance of protego.Protego.

  • can_fetch(url, user_agent) Return True if the user agent can fetch the URL, otherwise return False.

  • crawl_delay(user_agent) Return the crawl delay specified for the user agent as a float. If nothing is specified, return None.

  • request_rate(user_agent) Return the request rate specified for the user agent as a named tuple RequestRate(requests, seconds, start_time, end_time). If nothing is specified, return None.

  • visit_time(user_agent) Return the visit time specified for the user agent as a named tuple VisitTime(start_time, end_time). If nothing is specified, return None.

Wheel compatibility matrix

Platform Python 3
any

Files in release

No dependencies