An implementation of Wilkinson formulas.
Project Links
Meta
Author: Matthew Wardrop
Requires Python: >=3.9
Classifiers
Development Status
- 5 - Production/Stable
Environment
- Console
Intended Audience
- Developers
- Information Technology
- Science/Research
License
- OSI Approved :: MIT License
Natural Language
- English
Operating System
- OS Independent
Programming Language
- Python :: 3.9
- Python :: 3.10
- Python :: 3.11
- Python :: 3.12
- Python :: 3.13
Topic
- Scientific/Engineering :: Mathematics
Formulaic is a high-performance implementation of Wilkinson formulas for Python.
- Documentation: https://matthewwardrop.github.io/formulaic
- Source Code: https://github.com/matthewwardrop/formulaic
- Issue tracker: https://github.com/matthewwardrop/formulaic/issues
It provides:
- high-performance dataframe to model-matrix conversions.
- support for reusing the encoding choices made during conversion of one data-set on other datasets.
- extensible formula parsing.
- extensible data input/output plugins, with implementations for:
- input:
pandas.DataFrame- Any dataframe representation supported by
narwhalsincludingpyarrow.Tablepolars.DataFrame- ...
- output:
pandas.DataFramenumpy.ndarrayscipy.sparse.CSCMatrixnarwhalsdataframe passthrough when using narwhals dataframes.
- input:
- support for symbolic differentiation of formulas (and hence model matrices).
- and much more.
Example code
import pandas
from formulaic import Formula
df = pandas.DataFrame({
'y': [0, 1, 2],
'x': ['A', 'B', 'C'],
'z': [0.3, 0.1, 0.2],
})
y, X = Formula('y ~ x + z').get_model_matrix(df)
y =
| y | |
|---|---|
| 0 | 0 |
| 1 | 1 |
| 2 | 2 |
X =
| Intercept | x[T.B] | x[T.C] | z | |
|---|---|---|---|---|
| 0 | 1.0 | 0 | 0 | 0.3 |
| 1 | 1.0 | 1 | 0 | 0.1 |
| 2 | 1.0 | 0 | 1 | 0.2 |
Note that the above can be short-handed to:
from formulaic import model_matrix
model_matrix('y ~ x + z', df)
Benchmarks
Formulaic typically outperforms R for both dense and sparse model matrices, and vastly outperforms patsy (the existing implementation for Python) for dense matrices (patsy does not support sparse model matrix output).

For more details, see here.
Related projects and prior art
- Patsy: a prior implementation of Wilkinson formulas for Python, which is widely used (e.g. in statsmodels). It has fantastic documentation (which helped bootstrap this project), and a rich array of features.
- StatsModels.jl
@formula: The implementation of Wilkinson formulas for Julia. - R Formulas: The implementation of Wilkinson formulas for R, which is thoroughly introduced here. [R itself is an implementation of S, in which formulas were first made popular].
- The work that started it all: Wilkinson, G. N., and C. E. Rogers. Symbolic description of factorial models for analysis of variance. J. Royal Statistics Society 22, pp. 392–399, 1973.
Used by
Below are some of the projects that use Formulaic:
- Glum: High performance Python GLM's with all the features.
- Lifelines: Survival analysis in Python.
- Linearmodels: Additional linear models including instrumental variable and panel data models that are missing from statsmodels.
- Pyfixest: Fast High-Dimensional Fixed Effects Regression in Python following fixest-syntax.
- Tabmat: Efficient matrix representations for working with tabular data.
- Add your project here!
1.2.1
Sep 21, 2025
1.2.0
Jul 14, 2025
1.1.1
Dec 20, 2024
1.1.0
Dec 16, 2024
1.0.2
Jul 12, 2024
1.0.1
Dec 25, 2023
1.0.0
Dec 25, 2023
0.6.6
Oct 04, 2023
0.6.5
Sep 25, 2023
0.6.4
Jul 11, 2023
0.6.3
Jun 26, 2023
0.6.2
Jun 22, 2023
0.6.1
May 03, 2023
0.6.0
Apr 27, 2023
0.5.2
Sep 18, 2022
0.5.1
Sep 10, 2022
0.5.0
Aug 29, 2022
0.4.0
Aug 10, 2022
0.3.4
May 01, 2022
0.3.3
Apr 04, 2022
0.3.2
Mar 16, 2022
0.3.1
Mar 15, 2022
0.3.0
Mar 14, 2022
0.2.4
Jul 10, 2021
0.2.3
Feb 05, 2021
0.2.2
Feb 05, 2021
0.2.1
Jan 22, 2021
0.2.0
Jan 22, 2021
0.1.2
Nov 06, 2019
0.1.1
Oct 31, 2019
0.0.1
Sep 03, 2019
