GroupBy operations for dask.array
Project Links
Meta
Requires Python: >=3.10
Classifiers
Development Status
- 4 - Beta
License
- OSI Approved :: Apache Software License
Natural Language
- English
Operating System
- OS Independent
Programming Language
- Python
- Python :: 3.10
- Python :: 3.11
- Python :: 3.12
flox
This project explores strategies for fast GroupBy reductions with dask.array. It used to be called dask_groupby
It was motivated by
- Dask Dataframe GroupBy blogpost
- numpy_groupies in Xarray issue
(See a presentation about this package, from the Pangeo Showcase).
Acknowledgements
This work was funded in part by
- NASA-ACCESS 80NSSC18M0156 "Community tools for analysis of NASA Earth Observing System Data in the Cloud" (PI J. Hamman, NCAR),
- NASA-OSTFL 80NSSC22K0345 "Enhancing analysis of NASA data with the open-source Python Xarray Library" (PIs Scott Henderson, University of Washington; Deepak Cherian, NCAR; Jessica Scheick, University of New Hampshire), and
- NCAR's Earth System Data Science Initiative.
It was motivated by very very many discussions in the Pangeo community.
API
There are two main functions
flox.groupby_reduce(dask_array, by_dask_array, "mean")"pure" dask array interfaceflox.xarray.xarray_reduce(xarray_object, by_dataarray, "mean")"pure" xarray interface; though work is ongoing to integrate this package in xarray.
Implementation
See the documentation for details on the implementation.
Custom reductions
flox implements all common reductions provided by numpy_groupies in aggregations.py.
It also allows you to specify a custom Aggregation (again inspired by dask.dataframe),
though this might not be fully functional at the moment. See aggregations.py for examples.
mean = Aggregation(
# name used for dask tasks
name="mean",
# operation to use for pure-numpy inputs
numpy="mean",
# blockwise reduction
chunk=("sum", "count"),
# combine intermediate results: sum the sums, sum the counts
combine=("sum", "sum"),
# generate final result as sum / count
finalize=lambda sum_, count: sum_ / count,
# Used when "reindexing" at combine-time
fill_value=0,
# Used when any member of `expected_groups` is not found
final_fill_value=np.nan,
)
9.11
Sep 08, 2024
0.11.2
Feb 26, 2026
0.11.1
Feb 09, 2026
0.11.0
Feb 04, 2026
0.10.8
Dec 15, 2025
0.10.7
Sep 25, 2025
0.10.6
Aug 18, 2025
0.10.5
Aug 15, 2025
0.10.4
May 27, 2025
0.10.3
Apr 09, 2025
0.10.2
Apr 05, 2025
0.10.1
Mar 25, 2025
0.10.0
Jan 24, 2025
0.9.15
Nov 12, 2024
0.9.14
Nov 05, 2024
0.9.13
Sep 21, 2024
0.9.12
Sep 17, 2024
0.9.11
Sep 08, 2024
0.9.10
Aug 14, 2024
0.9.9
Aug 02, 2024
0.9.8
May 29, 2024
0.9.7
May 08, 2024
0.9.6
Mar 27, 2024
0.9.5
Mar 19, 2024
0.9.4
Mar 16, 2024
0.9.3
Mar 13, 2024
0.9.2
Feb 08, 2024
0.9.1
Feb 07, 2024
0.9.0
Jan 23, 2024
0.8.9
Jan 13, 2024
0.8.8
Jan 13, 2024
0.8.7
Jan 10, 2024
0.8.6
Jan 06, 2024
0.8.5
Nov 30, 2023
0.8.4
Nov 30, 2023
0.8.3
Nov 24, 2023
0.8.2
Nov 09, 2023
0.8.1
Oct 15, 2023
0.8.0
Oct 15, 2023
0.7.2
May 11, 2023
0.7.1
May 08, 2023
0.7.0
May 05, 2023
0.6.10
Mar 26, 2023
0.6.9
Mar 22, 2023
0.6.8
Feb 13, 2023
0.6.7
Jan 17, 2023
0.6.6
Jan 14, 2023
0.6.5
Dec 06, 2022
0.6.4
Nov 29, 2022
0.6.3
Oct 27, 2022
0.6.2
Oct 25, 2022
0.6.1
Oct 17, 2022
0.6.0
Oct 12, 2022
0.5.10
Oct 07, 2022
0.5.9
Jul 11, 2022
0.5.8
Jul 02, 2022
0.5.7
Jun 30, 2022
0.5.6
Jun 24, 2022
0.5.5
Jun 05, 2022
0.5.4
Jun 02, 2022
0.5.3
May 17, 2022
0.5.2
May 17, 2022
0.5.1
May 10, 2022
0.5.0
May 02, 2022
0.4.1
Mar 31, 2022
0.4.0
Mar 16, 2022
0.3.3
Feb 16, 2022
0.3.2
Jan 05, 2022
0.3.1
Dec 30, 2021
0.3.0
Dec 28, 2021
0.2.1
Nov 20, 2021
0.2.0
Nov 16, 2021
Wheel compatibility matrix
Files in release
Extras:
Dependencies:
pandas
(>=1.5)
packaging
(>=21.3)
numpy
(>=1.22)
numpy-groupies
(>=0.9.19)
toolz
scipy
(>=1.9)