License
- OSI Approved :: MIT License
Programming Language
- Python :: 3.9
- Python :: 3.10
- Python :: 3.11
- Python :: 3.12
- Python :: 3.13
sql-redis
A proof-of-concept SQL-to-Redis translator that converts SQL SELECT statements into Redis FT.SEARCH and FT.AGGREGATE commands.
Status
This is an early POC demonstrating feasibility, not a production-ready library. The goal is to explore design decisions and validate the approach before committing to a full implementation.
Quick Example
from redis import Redis
from sql_redis import Translator
from sql_redis.schema import SchemaRegistry
from sql_redis.executor import Executor
client = Redis()
registry = SchemaRegistry(client)
registry.load_all() # Loads index schemas from Redis
executor = Executor(client, registry)
# Simple query
result = executor.execute("""
SELECT title, price
FROM products
WHERE category = 'electronics' AND price < 500
ORDER BY price ASC
LIMIT 10
""")
for row in result.rows:
print(row["title"], row["price"])
# Vector search with params
result = executor.execute("""
SELECT title, vector_distance(embedding, :vec) AS score
FROM products
LIMIT 5
""", params={"vec": vector_bytes})
Design Decisions
Why SQL instead of a pandas-like Python DSL?
We considered several interface options:
| Approach | Example | Trade-offs |
|---|---|---|
| SQL | SELECT * FROM products WHERE price > 100 |
Universal, well-understood, tooling exists |
| Pandas-like | df[df.price > 100] |
Pythonic but limited to Python, no standard |
| Builder pattern | query.select("*").where(price__gt=100) |
Type-safe but verbose, learning curve |
We chose SQL because:
- Universality — SQL is the lingua franca of data. Developers, analysts, and tools all speak it.
- No new DSL to learn — Users already know SQL. A pandas-like API requires learning our specific dialect.
- Tooling compatibility — SQL strings can be generated by ORMs, query builders, or AI assistants.
- Clear mapping — SQL semantics map reasonably well to RediSearch operations (SELECT→LOAD, WHERE→filter, GROUP BY→GROUPBY).
The downside is losing Python's type checking and IDE support, but for a query interface, the universality trade-off is worth it.
Why sqlglot instead of writing a custom parser?
Options considered:
- Custom parser (regex, hand-rolled recursive descent)
- PLY/Lark (parser generators)
- sqlglot (production SQL parser)
- sqlparse (tokenizer, not a full parser)
We chose sqlglot because:
- Battle-tested — Used in production by companies like Tobiko (SQLMesh). Handles edge cases we'd miss.
- Full AST — Provides a complete abstract syntax tree, not just tokens. We can traverse and analyze queries properly.
- Dialect support — Handles SQL variations. Users can write MySQL-style or PostgreSQL-style queries.
- Active maintenance — Regular releases, responsive maintainers, good documentation.
The alternative was writing a custom parser, which would be error-prone and time-consuming for a POC. sqlglot lets us focus on the translation logic rather than parsing edge cases.
Why schema-aware translation?
Redis field types determine query syntax:
| Field Type | Redis Syntax | Example |
|---|---|---|
| TEXT | @field:term |
@title:laptop |
| NUMERIC | @field:[min max] |
@price:[100 500] |
| TAG | @field:{value} |
@category:{books} |
Without schema knowledge, we can't translate category = 'books' correctly — it could be @category:books (TEXT search) or @category:{books} (TAG exact match).
Our approach: The SchemaRegistry fetches index schemas via FT.INFO at startup. The translator uses this to generate correct syntax per field type.
This adds a Redis round-trip at initialization but ensures correct query generation.
Architecture: Why this layered design?
SQL String
↓
┌─────────────────┐
│ SQLParser │ Parse SQL → ParsedQuery dataclass
└────────┬────────┘
↓
┌─────────────────┐
│ SchemaRegistry │ Load field types from Redis
└────────┬────────┘
↓
┌─────────────────┐
│ Analyzer │ Classify conditions by field type
└────────┬────────┘
↓
┌─────────────────┐
│ QueryBuilder │ Generate RediSearch syntax per type
└────────┬────────┘
↓
┌─────────────────┐
│ Translator │ Orchestrate pipeline, build command
└────────┬────────┘
↓
┌─────────────────┐
│ Executor │ Execute command, parse results
└────────┬────────┘
↓
QueryResult(rows, count)
Why separate components?
- Testability — Each layer has focused unit tests. 100% coverage is achievable because responsibilities are clear.
- Single responsibility — Parser doesn't know about Redis. QueryBuilder doesn't know about SQL. Changes are localized.
- Extensibility — Adding a new field type (e.g., GEO) means updating Analyzer and QueryBuilder, not rewriting everything.
Why not a single monolithic translator?
Early prototypes combined parsing and translation. This led to:
- Tests that required Redis connections for simple SQL parsing tests
- Difficulty testing edge cases in isolation
- Tangled code that was hard to modify
The layered approach emerged from TDD — writing tests first revealed natural boundaries.
What's Implemented
- Basic SELECT with field selection
- WHERE with TEXT, NUMERIC, TAG field types
- Comparison operators:
=,!=,<,<=,>,>=,BETWEEN,IN - Boolean operators:
AND,OR - Aggregations:
COUNT,SUM,AVG,MIN,MAX -
GROUP BYwith multiple aggregations -
ORDER BYwith ASC/DESC -
LIMITandOFFSETpagination - Computed fields:
price * 0.9 AS discounted - Vector KNN search:
vector_distance(field, :param) - Hybrid search (filters + vector)
- Full-text search:
LIKE 'prefix%'(prefix),fulltext(field, 'terms')function - GEO field queries with full operator support (see below)
- Date functions:
YEAR(),MONTH(),DAY(),DATE_FORMAT(), etc. (see below)
What's Not Implemented (Yet...)
- JOINs (Redis doesn't support cross-index joins)
- Subqueries
- HAVING clause
- DISTINCT
- Index creation from SQL (CREATE INDEX)
DATE/DATETIME Handling
Redis does not have a native DATE field type. Dates are stored as NUMERIC fields with Unix timestamps.
sql-redis automatically converts ISO 8601 date literals to Unix timestamps:
-- Date literal (automatically converted to timestamp 1704067200)
SELECT * FROM events WHERE created_at > '2024-01-01'
-- Datetime literal with time
SELECT * FROM events WHERE created_at > '2024-01-01T12:00:00'
-- Date range with BETWEEN
SELECT * FROM events WHERE created_at BETWEEN '2024-01-01' AND '2024-01-31'
-- Multiple date conditions
SELECT * FROM events WHERE created_at > '2024-01-01' AND created_at < '2024-12-31'
Supported date formats:
- Date:
'2024-01-01'(interpreted as midnight UTC) - Datetime:
'2024-01-01T12:00:00'or'2024-01-01 12:00:00' - Datetime with timezone:
'2024-01-01T12:00:00Z','2024-01-01T12:00:00+00:00'
Note: All dates without timezone are interpreted as UTC. You can also use raw Unix timestamps if preferred:
SELECT * FROM events WHERE created_at > 1704067200
Date Functions
Extract date parts using SQL functions that map to Redis APPLY expressions:
| SQL Function | Redis Function | Description |
|---|---|---|
YEAR(field) |
year(@field) |
Extract year (e.g., 2024) |
MONTH(field) |
monthofyear(@field) |
Extract month (0-11) |
DAY(field) |
dayofmonth(@field) |
Extract day of month (1-31) |
HOUR(field) |
hour(@field) |
Round to hour |
MINUTE(field) |
minute(@field) |
Round to minute |
DAYOFWEEK(field) |
dayofweek(@field) |
Day of week (0=Sunday) |
DAYOFYEAR(field) |
dayofyear(@field) |
Day of year (0-365) |
DATE_FORMAT(field, fmt) |
timefmt(@field, fmt) |
Format timestamp |
Examples:
-- Extract year and month
SELECT name, YEAR(created_at) AS year, MONTH(created_at) AS month FROM events
-- Filter by year
SELECT name FROM events WHERE YEAR(created_at) = 2024
-- Group by date parts
SELECT YEAR(created_at) AS year, COUNT(*) FROM events GROUP BY year
-- Format dates
SELECT name, DATE_FORMAT(created_at, '%Y-%m-%d') AS date FROM events
Note: Redis's monthofyear() returns 0-11 (not 1-12), and dayofweek() returns 0 for Sunday.
Limitations
NOT YEAR(field) = 2024is not supported (raisesValueError)DATE_FORMAT()is only supported in SELECT, not in WHERE (raisesValueError)- Date functions combined with
ORare not supported (raisesValueError)
GEO Field Support
GEO fields are fully implemented with standard SQL-like syntax:
| Feature | Status |
|---|---|
| Coordinate order | ✅ POINT(lon, lat) — matches Redis native format |
| Default unit | ✅ Meters (m) — SQL standard |
| All operators | ✅ <, <=, >, >=, BETWEEN |
| Distance calculation | ✅ geo_distance() in SELECT clause |
| Combined filters | ✅ GEO + TEXT/TAG/NUMERIC |
Coordinate Order: POINT(lon, lat)
Use longitude first, matching Redis's native GEO format:
-- San Francisco coordinates: lon=-122.4194, lat=37.7749
SELECT name FROM stores WHERE geo_distance(location, POINT(-122.4194, 37.7749)) < 5000
Units
| Unit | Code | Example |
|---|---|---|
| Meters | m |
geo_distance(location, POINT(-122.4194, 37.7749)) < 5000 |
| Kilometers | km |
geo_distance(location, POINT(-122.4194, 37.7749), 'km') < 5 |
| Miles | mi |
geo_distance(location, POINT(-122.4194, 37.7749), 'mi') < 3 |
| Feet | ft |
geo_distance(location, POINT(-122.4194, 37.7749), 'ft') < 16400 |
Default is meters when no unit is specified.
Operators
All comparison operators are supported:
-- Less than (uses optimized GEOFILTER)
SELECT name FROM stores WHERE geo_distance(location, POINT(-122.4194, 37.7749)) < 5000
-- Less than or equal (uses optimized GEOFILTER)
SELECT name FROM stores WHERE geo_distance(location, POINT(-122.4194, 37.7749)) <= 5000
-- Greater than (uses FT.AGGREGATE with FILTER)
SELECT name FROM stores WHERE geo_distance(location, POINT(-122.4194, 37.7749)) > 100000
-- Greater than or equal (uses FT.AGGREGATE with FILTER)
SELECT name FROM stores WHERE geo_distance(location, POINT(-122.4194, 37.7749)) >= 100000
-- Between (uses FT.AGGREGATE with FILTER)
SELECT name FROM stores WHERE geo_distance(location, POINT(-122.4194, 37.7749), 'km') BETWEEN 10 AND 100
Distance Calculation in SELECT
Calculate distances for all results using geo_distance() in the SELECT clause:
-- Get distance to each store (returns meters)
SELECT name, geo_distance(location, POINT(-122.4194, 37.7749)) AS distance
FROM stores
-- With explicit unit
SELECT name, geo_distance(location, POINT(-122.4194, 37.7749), 'km') AS distance_km
FROM stores
Combined Filters
Combine GEO filters with other field types:
-- GEO + TAG filter
SELECT name FROM stores
WHERE category = 'retail' AND geo_distance(location, POINT(-122.4194, 37.7749)) < 5000
-- GEO + NUMERIC filter
SELECT name FROM stores
WHERE rating >= 4.0 AND geo_distance(location, POINT(-122.4194, 37.7749), 'mi') < 10
-- GEO + TEXT filter
SELECT name FROM stores
WHERE name = 'Downtown' AND geo_distance(location, POINT(-122.4194, 37.7749)) < 10000
Development
# Install dependencies
uv sync --all-extras
# Run tests (requires Docker for testcontainers)
uv run pytest
# Run with coverage
uv run pytest --cov=sql_redis --cov-report=html
Testing Philosophy
This project uses strict TDD with 100% test coverage as a hard requirement. The approach:
- Write failing tests first — Define expected behavior before implementation
- One test at a time — Implement just enough to pass each test
- No untestable code — If we can't test it, we don't write it
- Integration tests mirror raw Redis —
test_sql_queries.pyverifies SQL produces same results as equivalentFT.AGGREGATEcommands intest_redis_queries.py
Coverage is enforced in CI. Pragmas (# pragma: no cover) are forbidden — if code can't be tested, it shouldn't exist.