dask 0.7.4


pip install dask==0.7.4

Project Links

Meta
Author: Matthew Rocklin

Classifiers

Build Status Coverage status Documentation Status Join the chat at https://gitter.im/blaze/dask Version Status Downloads

Dask provides multi-core execution on larger-than-memory datasets using blocked algorithms and task scheduling. It maps high-level NumPy, Pandas, and list operations on large datasets on to many operations on small in-memory datasets. It then executes these graphs in parallel on a single machine. Dask lets us use traditional NumPy, Pandas, and list programming while operating on inconveniently large data in a small amount of space.

  • dask is a specification to describe task dependency graphs.

  • dask.array is a drop-in NumPy replacement (for a subset of NumPy) that encodes blocked algorithms in dask dependency graphs.

  • dask.bag encodes blocked algorithms on Python lists of arbitrary Python objects.

  • dask.dataframe encodes blocked algorithms on Pandas DataFrames.

  • dask.async is a shared-memory asynchronous scheduler efficiently execute dask dependency graphs on multiple cores.

See full documentation at http://dask.pydata.org or read developer-focused blogposts about dask’s development.

Use dask.array

Dask.array implements a numpy clone on larger-than-memory datasets using multiple cores.

>>> import dask.array as da

>>> x = da.random.normal(10, 0.1, size=(100000, 100000), chunks=(1000, 1000))

>>> x.mean(axis=0)[:3].compute()
array([ 10.00026926,  10.0000592 ,  10.00038236])

Use dask.dataframe

Dask.dataframe implements a Pandas clone on larger-than-memory datasets using multiple cores.

>>> import dask.dataframe as dd
>>> df = dd.read_csv('nyc-taxi-*.csv.gz')

>>> g = df.groupby('medallion')
>>> g.trip_time_in_secs.mean().head(5)
medallion
0531373C01FD1416769E34F5525B54C8     795.875026
867D18559D9D2941173AD7A0F3B33E77     924.187954
BD34A40EDD5DC5368B0501F704E952E7     717.966875
5A47679B2C90EA16E47F772B9823CE51     763.005149
89CE71B8514E7674F1C662296809DDF6     869.274052
Name: trip_time_in_secs, dtype: float64

Use dask.bag

Dask.bag implements a large collection of Python objects and mimicing the toolz interface

>>> import dask.bag as db
>>> import json
>>> b = db.from_filenames('2014-*.json.gz')
...       .map(json.loads)

>>> alices = b.filter(lambda d: d['name'] == 'Alice')
>>> alices.take(3)
({'name': 'Alice', 'city': 'LA',  'balance': 100},
 {'name': 'Alice', 'city': 'LA',  'balance': 200},
 {'name': 'Alice', 'city': 'NYC', 'balance': 300},

>>> dict(alices.pluck('city').frequencies())
{'LA': 10000, 'NYC': 20000, ...}

Use Dask Graphs

Dask.array, dask.dataframe, and dask.bag are thin layers on top of dask graphs, which represent computational task graphs of regular Python functions on regular Python objects.

As an example consider the following simple program:

def inc(i):
    return i + 1

def add(a, b):
    return a + b

x = 1
y = inc(x)
z = add(y, 10)

We encode this computation as a dask graph in the following way:

d = {'x': 1,
     'y': (inc, 'x'),
     'z': (add, 'y', 10)}

A dask graph is just a dictionary of tuples where the first element of the tuple is a function and the rest are the arguments for that function. While this representation of the computation above may be less aesthetically pleasing, it may now be analyzed, optimized, and computed by other Python code, not just the Python interpreter.

A simple dask dictionary

Install

Dask is easily installable through your favorite Python package manager:

conda install dask

or

pip install dask[array]
or
pip install dask[bag]
or
pip install dask[dataframe]
or
pip install dask[complete]

Dependencies

dask.core supports Python 2.6+ and Python 3.3+ with a common codebase. It is pure Python and requires no dependencies beyond the standard library. It is a light weight dependency.

dask.array depends on numpy.

dask.bag depends on toolz and dill.

Examples

Dask examples are available in the following repository: https://github.com/blaze/dask-examples.

You can also find them in Anaconda.org: https://notebooks.anaconda.org/dask/.

LICENSE

New BSD. See License File.

2025.5.1 May 20, 2025
2025.5.0 May 13, 2025
2025.4.1 Apr 25, 2025
2025.4.0 Apr 22, 2025
2025.3.0 Mar 21, 2025
2025.2.0 Feb 13, 2025
2025.1.0 Jan 17, 2025
2024.12.1 Dec 17, 2024
2024.12.0 Dec 03, 2024
2024.11.2 Nov 13, 2024
2024.11.1 Nov 11, 2024
2024.11.0 Nov 08, 2024
2024.10.0 Oct 17, 2024
2024.9.1 Sep 28, 2024
2024.9.0 Sep 13, 2024
2024.8.2 Aug 30, 2024
2024.8.1 Aug 16, 2024
2024.8.0 Aug 06, 2024
2024.7.1 Jul 20, 2024
2024.7.0 Jul 05, 2024
2024.6.2 Jun 20, 2024
2024.6.1 Jun 19, 2024
2024.6.0 Jun 14, 2024
2024.5.2 May 31, 2024
2024.5.1 May 17, 2024
2024.5.0 May 03, 2024
2024.4.2 Apr 19, 2024
2024.4.1 Apr 04, 2024
2024.4.0 Apr 01, 2024
2024.3.1 Mar 15, 2024
2024.3.0 Mar 12, 2024
2024.2.1 Feb 23, 2024
2024.2.0 Feb 09, 2024
2024.1.1 Jan 26, 2024
2024.1.0 Jan 12, 2024
2023.12.1 Dec 15, 2023
2023.12.0 Dec 01, 2023
2023.11.0 Nov 10, 2023
2023.10.1 Oct 27, 2023
2023.10.0 Oct 14, 2023
2023.9.3 Sep 29, 2023
2023.9.2 Sep 15, 2023
2023.9.1 Sep 06, 2023
2023.9.0 Sep 01, 2023
2023.8.1 Aug 18, 2023
2023.8.0 Aug 04, 2023
2023.7.1 Jul 20, 2023
2023.7.0 Jul 07, 2023
2023.6.1 Jun 26, 2023
2023.6.0 Jun 09, 2023
2023.5.1 May 26, 2023
2023.5.0 May 12, 2023
2023.4.1 Apr 28, 2023
2023.4.0 Apr 14, 2023
2023.3.2 Mar 24, 2023
2023.3.1 Mar 10, 2023
2023.3.0 Mar 01, 2023
2023.2.1 Feb 24, 2023
2023.2.0 Feb 10, 2023
2023.1.1 Jan 27, 2023
2023.1.0 Jan 13, 2023
2022.12.1 Dec 16, 2022
2022.12.0 Dec 02, 2022
2022.11.1 Nov 18, 2022
2022.11.0 Nov 15, 2022
2022.10.2 Oct 31, 2022
2022.10.1 Oct 28, 2022
2022.10.0 Oct 14, 2022
2022.9.2 Sep 30, 2022
2022.9.1 Sep 16, 2022
2022.9.0 Sep 02, 2022
2022.8.1 Aug 19, 2022
2022.8.0 Aug 05, 2022
2022.7.1 Jul 22, 2022
2022.7.0 Jul 08, 2022
2022.6.1 Jun 24, 2022
2022.6.0 Jun 10, 2022
2022.5.2 May 26, 2022
2022.5.1 May 24, 2022
2022.5.0 May 02, 2022
2022.4.2 Apr 29, 2022
2022.4.1 Apr 15, 2022
2022.4.0 Apr 01, 2022
2022.3.0 Mar 18, 2022
2022.2.1 Feb 25, 2022
2022.2.0 Feb 11, 2022
2022.1.1 Jan 28, 2022
2022.1.0 Jan 14, 2022
2021.12.0 Dec 10, 2021
2021.11.2 Nov 19, 2021
2021.11.1 Nov 08, 2021
2021.11.0 Nov 05, 2021
2021.10.0 Oct 22, 2021
2021.9.1 Sep 21, 2021
2021.9.0 Sep 03, 2021
2021.8.1 Aug 20, 2021
2021.8.0 Aug 13, 2021
2021.7.2 Jul 30, 2021
2021.7.1 Jul 23, 2021
2021.7.0 Jul 09, 2021
2021.6.2 Jun 22, 2021
2021.6.1 Jun 18, 2021
2021.6.0 Jun 04, 2021
2021.5.1 May 28, 2021
2021.5.0 May 14, 2021
2021.4.1 Apr 23, 2021
2021.4.0 Apr 02, 2021
2021.3.1 Mar 26, 2021
2021.3.0 Mar 05, 2021
2021.2.0 Feb 05, 2021
2021.1.1 Jan 22, 2021
2021.1.0 Jan 15, 2021
2020.12.0 Dec 11, 2020
2.30.0 Oct 06, 2020
2.29.0 Oct 02, 2020
2.28.0 Sep 26, 2020
2.27.0 Sep 19, 2020
2.26.0 Sep 11, 2020
2.25.0 Aug 28, 2020
2.24.0 Aug 22, 2020
2.23.0 Aug 14, 2020
2.22.0 Jul 31, 2020
2.21.0 Jul 17, 2020
2.20.0 Jul 03, 2020
2.19.0 Jun 19, 2020
2.18.1 Jun 10, 2020
2.18.0 Jun 06, 2020
2.17.2 May 28, 2020
2.17.1 May 28, 2020
2.17.0 May 27, 2020
2.16.0 May 08, 2020
2.15.0 Apr 25, 2020
2.14.0 Apr 03, 2020
2.13.0 Mar 25, 2020
2.12.0 Mar 06, 2020
2.11.0 Feb 19, 2020
2.10.1 Jan 30, 2020
2.10.0 Jan 28, 2020
2.9.2 Jan 16, 2020
2.9.1 Dec 27, 2019
2.9.0 Dec 06, 2019
2.8.1 Nov 23, 2019
2.8.0 Nov 14, 2019
2.7.0 Nov 08, 2019
2.6.0 Oct 16, 2019
2.5.2 Oct 04, 2019
2.5.0 Sep 27, 2019
2.4.0 Sep 13, 2019
2.3.0 Aug 16, 2019
2.2.0 Jul 31, 2019
2.1.0 Jul 08, 2019
2.0.0 Jun 25, 2019
1.2.2 May 08, 2019
1.2.1 Apr 29, 2019
1.2.0 Apr 12, 2019
1.1.5 Mar 29, 2019
1.1.4 Mar 09, 2019
1.1.3 Mar 01, 2019
1.1.2 Feb 25, 2019
1.1.1 Jan 31, 2019
1.1.0 Jan 18, 2019
1.0.0 Nov 28, 2018
0.20.2 Nov 15, 2018
0.20.1 Nov 09, 2018
0.20.0 Oct 26, 2018
0.19.4 Oct 09, 2018
0.19.3 Oct 05, 2018
0.19.2 Sep 17, 2018
0.19.1 Sep 06, 2018
0.19.0 Aug 30, 2018
0.18.2 Jul 23, 2018
0.18.1 Jun 22, 2018
0.18.0 Jun 15, 2018
0.17.5 May 16, 2018
0.17.4 May 03, 2018
0.17.3 May 02, 2018
0.17.2 Mar 21, 2018
0.17.1 Feb 22, 2018
0.17.0 Feb 09, 2018
0.16.1 Jan 09, 2018
0.16.0 Nov 17, 2017
0.15.4 Oct 07, 2017
0.15.3 Sep 24, 2017
0.15.2 Aug 26, 2017
0.15.1 Jul 08, 2017
0.15.0 Jun 11, 2017
0.14.3 May 05, 2017
0.14.2 May 03, 2017
0.14.1 Mar 22, 2017
0.14.0 Feb 24, 2017
0.13.0 Jan 02, 2017
0.13.0rc1 Dec 30, 2016
0.12.0 Nov 04, 2016
0.11.1 Oct 07, 2016
0.11.0 Aug 18, 2016
0.10.2 Jul 26, 2016
0.10.1 Jul 11, 2016
0.10.0 Jun 13, 2016
0.9.0 May 10, 2016
0.8.2 Apr 13, 2016
0.8.1 Mar 11, 2016
0.8.0 Feb 16, 2016
0.7.6 Jan 05, 2016
0.7.5 Oct 25, 2015
0.7.4 Oct 22, 2015
0.7.3 Sep 25, 2015
0.7.2 Sep 25, 2015
0.7.1 Sep 04, 2015
0.7.0 Aug 14, 2015
0.6.1 Jul 22, 2015
0.6.0 Jun 30, 2015
0.5.0 May 15, 2015
0.4.0 Apr 20, 2015
0.3.0 Mar 05, 2015
0.2.6 Feb 17, 2015
0.2.5 Feb 17, 2015
0.2.4 Feb 17, 2015
0.2.3 Feb 17, 2015
0.2.2 Feb 17, 2015
0.2.1 Feb 14, 2015
0.2.0 Jan 29, 2015
0.0rc0 Sep 25, 2015
No dependencies