sqlframe · Python Simple Repository Browser

Warning: This is a tech preview of github.com/simple-repository. The service is unsupported and may be removed at any time.

Project Links

Meta

Author: Ryan Eakman

Requires Python: >=3.10

Classifiers

Development Status

5 - Production/Stable

Intended Audience

Developers
Science/Research

License

OSI Approved :: MIT License

Operating System

OS Independent

Programming Language

SQL
Python :: 3 :: Only

SQLFrame implements the PySpark DataFrame API in order to enable running transformation pipelines directly on database engines - no Spark clusters or dependencies required.

SQLFrame currently supports the following engines:

BigQuery
Databricks
DuckDB
Postgres
Snowflake
Spark
GizmoSQL (community maintained)
Redshift (community maintained)

SQLFrame also has a "Standalone" session that be used to generate SQL without any connection to a database engine.

Standalone

SQLFrame is great for:

Users who want a DataFrame API that leverages the full power of their engine to do the processing
Users who want to run PySpark code quickly locally without the overhead of starting a Spark session
Users who want a SQL representation of their DataFrame code for debugging or sharing with others
Users who want to run PySpark DataFrame code without the complexity of using Spark for processing

Installation

# BigQuery
pip install "sqlframe[bigquery]"
# Databricks
pip install "sqlframe[databricks]"
# DuckDB
pip install "sqlframe[duckdb]"
# Postgres
pip install "sqlframe[postgres]"
# Snowflake
pip install "sqlframe[snowflake]"
# Spark
pip install "sqlframe[spark]"
# Redshift (in development)
pip install "sqlframe[redshift]"
# Standalone
pip install sqlframe
# Or from conda-forge
conda install -c conda-forge sqlframe

See specific engine documentation for additional setup instructions.

Configuration

SQLFrame generates consistently accurate yet complex SQL for engine execution. However, when using df.sql(optimize=True), it produces more human-readable SQL. For details on how to configure this output and leverage OpenAI to enhance the SQL, see Generated SQL Configuration.

SQLFrame by default uses the Spark dialect for input and output. This can be changed to make SQLFrame feel more like a native DataFrame API for the engine you are using. See Input and Output Dialect Configuration.

Activating SQLFrame

SQLFrame can either replace pyspark imports or be used alongside them. To replace pyspark imports, use the activate function to set the engine to use.

from sqlframe import activate

# Activate SQLFrame to run directly on DuckDB
activate(engine="duckdb")

from pyspark.sql import SparkSession
session = SparkSession.builder.getOrCreate()

SQLFrame can also be directly imported which both maintains pyspark imports but also allows for a more engine-native DataFrame API:

from sqlframe.duckdb import DuckDBSession

session = DuckDBSession.builder.getOrCreate()

Example Usage

from sqlframe import activate

# Activate SQLFrame to run directly on BigQuery
activate(engine="bigquery")

from pyspark.sql import SparkSession
from pyspark.sql import functions as F
from pyspark.sql import Window

session = SparkSession.builder.getOrCreate()
table_path = '"bigquery-public-data".samples.natality'
# Top 5 years with the greatest year-over-year % change in new families with single child
df = (
  session.table(table_path)
  .where(F.col("ever_born") == 1)
  .groupBy("year")
  .agg(F.count("*").alias("num_single_child_families"))
  .withColumn(
    "last_year_num_single_child_families",
    F.lag(F.col("num_single_child_families"), 1).over(Window.orderBy("year"))
  )
  .withColumn(
    "percent_change",
    (F.col("num_single_child_families") - F.col("last_year_num_single_child_families"))
    / F.col("last_year_num_single_child_families")
  )
  .orderBy(F.abs(F.col("percent_change")).desc())
  .select(
    F.col("year").alias("year"),
    F.format_number("num_single_child_families", 0).alias("new families single child"),
    F.format_number(F.col("percent_change") * 100, 2).alias("percent change"),
  )
  .limit(5)
)

>>> df.sql(optimize=True)
WITH `t94228` AS (
  SELECT
    `natality`.`year` AS `year`,
    COUNT(*) AS `num_single_child_families`
  FROM `bigquery-public-data`.`samples`.`natality` AS `natality`
  WHERE
    `natality`.`ever_born` = 1
  GROUP BY
    `natality`.`year`
), `t39093` AS (
  SELECT
    `t94228`.`year` AS `year`,
    `t94228`.`num_single_child_families` AS `num_single_child_families`,
    LAG(`t94228`.`num_single_child_families`, 1) OVER (ORDER BY `t94228`.`year`) AS `last_year_num_single_child_families`
  FROM `t94228` AS `t94228`
)
SELECT
  `t39093`.`year` AS `year`,
  FORMAT('%\'.0f', ROUND(CAST(`t39093`.`num_single_child_families` AS FLOAT64), 0)) AS `new families single child`,
  FORMAT('%\'.2f', ROUND(CAST((((`t39093`.`num_single_child_families` - `t39093`.`last_year_num_single_child_families`) / `t39093`.`last_year_num_single_child_families`) * 100) AS FLOAT64), 2)) AS `percent change`
FROM `t39093` AS `t39093`
ORDER BY
  ABS(`percent_change`) DESC
LIMIT 5

>>> df.show()
+------+---------------------------+----------------+
| year | new families single child | percent change |
+------+---------------------------+----------------+
| 1989 |         1,650,246         |     25.02      |
| 1974 |          783,448          |     14.49      |
| 1977 |         1,057,379         |     11.38      |
| 1985 |         1,308,476         |     11.15      |
| 1975 |          868,985          |     10.92      |
+------+---------------------------+----------------+

4.2.0 May 20, 2026

4.1.0 Mar 18, 2026

4.0.0 Mar 17, 2026

3.49.0 Mar 15, 2026

3.48.0 Mar 07, 2026

3.47.0 Feb 28, 2026

3.46.2 Feb 04, 2026

3.46.1 Feb 02, 2026

3.46.0 Feb 01, 2026

3.45.0 Jan 21, 2026

3.44.1 Jan 10, 2026

3.44.0 Jan 01, 2026

3.43.8 Nov 01, 2025

3.43.7 Oct 24, 2025

3.43.6 Oct 12, 2025

3.43.5 Oct 10, 2025

3.43.4 Oct 08, 2025

3.43.3 Sep 30, 2025

3.43.2 Sep 24, 2025

3.43.1 Sep 19, 2025

3.43.0 Sep 18, 2025

3.42.0 Sep 16, 2025

3.41.0 Sep 15, 2025

3.40.2 Sep 10, 2025

3.40.1 Sep 08, 2025

3.40.0 Sep 07, 2025

3.39.4 Aug 28, 2025

3.39.3 Aug 27, 2025

3.39.2 Aug 21, 2025

3.39.1 Aug 16, 2025

3.39.0 Aug 16, 2025

3.38.2 Aug 07, 2025

3.38.1 Aug 07, 2025

3.38.0 Jul 26, 2025

3.37.0 Jul 19, 2025

3.36.3 Jul 06, 2025

3.36.2 Jul 05, 2025

3.36.1 Jun 30, 2025

3.36.0 Jun 29, 2025

3.35.1 Jun 06, 2025

3.35.0 May 31, 2025

3.34.0 May 27, 2025

3.33.1 May 25, 2025

3.33.0 May 16, 2025

3.32.1 May 12, 2025

3.32.0 May 11, 2025

3.31.4 May 10, 2025

3.31.3 Apr 30, 2025

3.31.2 Apr 26, 2025

3.31.1 Apr 25, 2025

3.31.0 Apr 23, 2025

3.30.0 Apr 21, 2025

3.29.1 Apr 16, 2025

3.29.0 Apr 06, 2025

3.28.2 Apr 05, 2025

3.28.1 Apr 04, 2025

3.28.0 Mar 28, 2025

3.27.1 Mar 27, 2025

3.27.0 Mar 26, 2025

3.26.0 Mar 22, 2025

3.25.0 Mar 22, 2025

3.24.1 Mar 12, 2025

3.24.0 Mar 11, 2025

3.23.0 Mar 06, 2025

3.22.1 Feb 28, 2025

3.22.0 Feb 18, 2025

3.21.1 Feb 16, 2025

3.21.0 Feb 15, 2025

3.20.0 Feb 15, 2025

3.19.0 Feb 11, 2025

3.18.1 Feb 07, 2025

3.18.0 Feb 06, 2025

3.17.1 Feb 05, 2025

3.17.0 Feb 02, 2025

3.16.0 Feb 01, 2025

3.15.1 Jan 30, 2025

3.15.0 Jan 29, 2025

3.14.2 Jan 25, 2025

3.14.1 Jan 24, 2025

3.14.0 Jan 22, 2025

3.13.4 Jan 19, 2025

3.13.3 Jan 18, 2025

3.13.2 Jan 18, 2025

3.13.1 Jan 11, 2025

3.13.0 Dec 29, 2024

3.12.0 Dec 27, 2024

3.11.0 Dec 24, 2024

3.10.1 Dec 18, 2024

3.10.0 Dec 16, 2024

3.9.3 Dec 15, 2024

3.9.2 Nov 30, 2024

3.9.1 Nov 29, 2024

3.9.0 Nov 27, 2024

3.8.2 Nov 20, 2024

3.8.1 Nov 20, 2024

3.8.0 Nov 19, 2024

3.7.0 Nov 04, 2024

3.6.0 Oct 29, 2024

3.5.0 Oct 18, 2024

3.4.1 Oct 13, 2024

3.4.0 Oct 05, 2024

3.3.1 Sep 22, 2024

3.3.0 Sep 15, 2024

3.2.0 Aug 31, 2024

3.1.1 Aug 27, 2024

3.1.0 Aug 27, 2024

3.0.0 Aug 25, 2024

2.4.0 Aug 23, 2024

2.3.0 Aug 21, 2024

2.2.0 Aug 12, 2024

2.1.0 Aug 10, 2024

2.0.0 Jul 30, 2024

1.14.0 Jun 29, 2024

1.13.0 Jun 28, 2024

1.12.0 Jun 27, 2024

1.11.0 Jun 26, 2024

1.10.0 Jun 25, 2024

1.9.0 Jun 21, 2024

1.8.0 Jun 12, 2024

1.7.1 Jun 11, 2024

1.7.0 Jun 08, 2024

1.6.3 Jun 07, 2024

1.6.2 Jun 06, 2024

1.6.1 Jun 05, 2024

1.6.0 Jun 04, 2024

1.5.5 Jun 03, 2024

1.5.4 Jun 03, 2024

1.5.3 Jun 02, 2024

1.5.2 Jun 02, 2024

1.5.1 Jun 02, 2024

1.5.0 Jun 02, 2024

1.4.0 May 30, 2024

1.3.0 May 28, 2024

1.2.0 May 25, 2024

1.1.3 May 24, 2024

1.1.2 May 23, 2024

1.1.1 May 23, 2024

1.1.0 May 22, 2024

1.0.0 May 21, 2024

0.1.dev3 May 18, 2024

0.0.3 May 19, 2024

0.0.2 May 18, 2024

Wheel compatibility matrix

Platform	Python 3
any

Files in release

sqlframe-4.2.0-py3-none-any.whl (216.7KiB)

sqlframe-4.2.0.tar.gz (28.4MiB)

Extras:

Dependencies:

prettytable (<4)

sqlglot (<30.9,>=28.0.0)

typing_extensions