zarr 0.4.0


pip install zarr==0.4.0

Project Links

Meta
Author: Alistair Miles

Classifiers

Development Status
  • 2 - Pre-Alpha

Intended Audience
  • Developers
  • Information Technology
  • Science/Research

License
  • OSI Approved :: MIT License

Programming Language
  • Python :: 3
  • Python :: 3.4
  • Python :: 3.5

Topic
  • Software Development :: Libraries :: Python Modules

Operating System
  • Unix

A minimal implementation of chunked, compressed, N-dimensional arrays for Python.

https://travis-ci.org/alimanfoo/zarr.svg?branch=master

Installation

Installation requires Numpy and Cython pre-installed. Can only be installed on Linux currently.

Install from PyPI:

$ pip install -U zarr

Install from GitHub:

$ pip install -U git+https://github.com/alimanfoo/zarr.git@master

Status

Experimental, proof-of-concept. This is alpha-quality software. Things may break, change or disappear without warning.

Bug reports and suggestions welcome.

Design goals

  • Chunking in multiple dimensions

  • Resize any dimension

  • Concurrent reads

  • Concurrent writes

  • Release the GIL during compression and decompression

Usage

Create an array:

>>> import numpy as np
>>> import zarr
>>> z = zarr.empty(shape=(10000, 1000), dtype='i4', chunks=(1000, 100))
>>> z
zarr.ext.SynchronizedArray((10000, 1000), int32, chunks=(1000, 100))
  cname: blosclz; clevel: 5; shuffle: 1 (BYTESHUFFLE)
  nbytes: 38.1M; cbytes: 0; initialized: 0/100

Fill it with some data:

>>> z[:] = np.arange(10000000, dtype='i4').reshape(10000, 1000)
>>> z
zarr.ext.SynchronizedArray((10000, 1000), int32, chunks=(1000, 100))
  cname: blosclz; clevel: 5; shuffle: 1 (BYTESHUFFLE)
  nbytes: 38.1M; cbytes: 2.0M; ratio: 19.3; initialized: 100/100

Obtain a NumPy array by slicing:

>>> z[:]
array([[      0,       1,       2, ...,     997,     998,     999],
       [   1000,    1001,    1002, ...,    1997,    1998,    1999],
       [   2000,    2001,    2002, ...,    2997,    2998,    2999],
       ...,
       [9997000, 9997001, 9997002, ..., 9997997, 9997998, 9997999],
       [9998000, 9998001, 9998002, ..., 9998997, 9998998, 9998999],
       [9999000, 9999001, 9999002, ..., 9999997, 9999998, 9999999]], dtype=int32)
>>> z[:100]
array([[    0,     1,     2, ...,   997,   998,   999],
       [ 1000,  1001,  1002, ...,  1997,  1998,  1999],
       [ 2000,  2001,  2002, ...,  2997,  2998,  2999],
       ...,
       [97000, 97001, 97002, ..., 97997, 97998, 97999],
       [98000, 98001, 98002, ..., 98997, 98998, 98999],
       [99000, 99001, 99002, ..., 99997, 99998, 99999]], dtype=int32)
>>> z[:, :100]
array([[      0,       1,       2, ...,      97,      98,      99],
       [   1000,    1001,    1002, ...,    1097,    1098,    1099],
       [   2000,    2001,    2002, ...,    2097,    2098,    2099],
       ...,
       [9997000, 9997001, 9997002, ..., 9997097, 9997098, 9997099],
       [9998000, 9998001, 9998002, ..., 9998097, 9998098, 9998099],
       [9999000, 9999001, 9999002, ..., 9999097, 9999098, 9999099]], dtype=int32)

Resize the array and add more data:

>>> z.resize(20000, 1000)
>>> z
zarr.ext.SynchronizedArray((20000, 1000), int32, chunks=(1000, 100))
  cname: blosclz; clevel: 5; shuffle: 1 (BYTESHUFFLE)
  nbytes: 76.3M; cbytes: 2.0M; ratio: 38.5; initialized: 100/200
>>> z[10000:, :] = np.arange(10000000, dtype='i4').reshape(10000, 1000)
>>> z
zarr.ext.SynchronizedArray((20000, 1000), int32, chunks=(1000, 100))
  cname: blosclz; clevel: 5; shuffle: 1 (BYTESHUFFLE)
  nbytes: 76.3M; cbytes: 4.0M; ratio: 19.3; initialized: 200/200

For convenience, an append() method is also available, which can be used to append data to any axis:

>>> a = np.arange(10000000, dtype='i4').reshape(10000, 1000)
>>> z = zarr.array(a, chunks=(1000, 100))
>>> z.append(a+a)
>>> z
zarr.ext.SynchronizedArray((20000, 1000), int32, chunks=(1000, 100))
  cname: blosclz; clevel: 5; shuffle: 1 (BYTESHUFFLE)
  nbytes: 76.3M; cbytes: 3.6M; ratio: 21.2; initialized: 200/200
>>> z.append(np.vstack([a, a]), axis=1)
>>> z
zarr.ext.SynchronizedArray((20000, 2000), int32, chunks=(1000, 100))
  cname: blosclz; clevel: 5; shuffle: 1 (BYTESHUFFLE)
  nbytes: 152.6M; cbytes: 7.6M; ratio: 20.2; initialized: 400/400

Persistence

Create a persistent array (data stored on disk):

>>> path = 'example.zarr'
>>> z = zarr.open(path, mode='w', shape=(10000, 1000), dtype='i4', chunks=(1000, 100))
>>> z[:] = np.arange(10000000, dtype='i4').reshape(10000, 1000)
>>> z
zarr.ext.SynchronizedPersistentArray((10000, 1000), int32, chunks=(1000, 100))
  cname: blosclz; clevel: 5; shuffle: 1 (BYTESHUFFLE)
  nbytes: 38.1M; cbytes: 2.0M; ratio: 19.3; initialized: 100/100
  mode: w; path: example.zarr

There is no need to close a persistent array. Data are automatically flushed to disk.

If you’re working with really big arrays, try the ‘lazy’ option:

>>> path = 'big.zarr'
>>> z = zarr.open(path, mode='w', shape=(1e8, 1e7), dtype='i4', chunks=(1000, 1000), lazy=True)
>>> z
zarr.ext.SynchronizedLazyPersistentArray((100000000, 10000000), int32, chunks=(1000, 1000))
  cname: blosclz; clevel: 5; shuffle: 1 (BYTESHUFFLE)
  nbytes: 3.6P; cbytes: 0; initialized: 0/1000000000
  mode: w; path: big.zarr

See the persistence documentation for more details of the file format.

Tuning

zarr is optimised for accessing and storing data in contiguous slices, of the same size or larger than chunks. It is not and probably never will be optimised for single item access.

Chunks sizes >= 1M are generally good. Optimal chunk shape will depend on the correlation structure in your data.

zarr is designed for use in parallel computations working chunk-wise over data. Try it with dask.array. If using in a multi-threaded, set zarr to use blosc in contextual mode:

>>> zarr.set_blosc_options(use_context=True)

Acknowledgments

zarr uses c-blosc internally for compression and decompression and borrows code heavily from bcolz.

3.0.8 May 19, 2025
3.0.7 Apr 21, 2025
3.0.6 Mar 20, 2025
3.0.5 Mar 07, 2025
3.0.4 Feb 23, 2025
3.0.3 Feb 14, 2025
3.0.2 Jan 31, 2025
3.0.1 Jan 17, 2025
3.0.0 Jan 09, 2025
3.0.0rc2 Jan 07, 2025
3.0.0rc1 Jan 03, 2025
3.0.0b3 Dec 06, 2024
3.0.0b2 Nov 14, 2024
3.0.0b1 Oct 18, 2024
3.0.0b0 Oct 11, 2024
3.0.0a7 Oct 05, 2024
3.0.0a6 Sep 27, 2024
3.0.0a5 Sep 21, 2024
3.0.0a4 Sep 13, 2024
3.0.0a3 Sep 06, 2024
3.0.0a2 Aug 30, 2024
3.0.0a1 Aug 23, 2024
3.0.0a0 Jun 12, 2024
2.18.7 Apr 09, 2025
2.18.6 Apr 08, 2025
2.18.5 Mar 28, 2025
2.18.4 Dec 12, 2024
2.18.3 Sep 04, 2024
2.18.2 May 26, 2024
2.18.1 May 17, 2024
2.18.0 May 07, 2024
2.17.2 Apr 05, 2024
2.17.1 Mar 06, 2024
2.17.0 Feb 14, 2024
2.16.1 Aug 18, 2023
2.16.0 Jul 20, 2023
2.15.0 Jun 14, 2023
2.15.0a2 May 25, 2023
2.15.0a1 May 02, 2023
2.14.2 Feb 24, 2023
2.14.1 Feb 12, 2023
2.14.0 Feb 10, 2023
2.13.6 Jan 16, 2023
2.13.3 Oct 09, 2022
2.13.2 Sep 27, 2022
2.13.1 Sep 26, 2022
2.13.0 Sep 22, 2022
2.13.0a2 Sep 08, 2022
2.13.0a1 Aug 06, 2022
2.12.0 Jun 23, 2022
2.12.0a2 May 23, 2022
2.12.0a1 May 10, 2022
2.11.3 Apr 06, 2022
2.11.2 Apr 05, 2022
2.11.1 Mar 07, 2022
2.11.0 Feb 07, 2022
2.11.0a2 Nov 04, 2021
2.11.0a1 Oct 20, 2021
2.10.3 Nov 19, 2021
2.10.2 Oct 19, 2021
2.10.1 Sep 30, 2021
2.10.0 Sep 19, 2021
2.9.5 Sep 01, 2021
2.9.4 Aug 30, 2021
2.9.3 Aug 26, 2021
2.9.2 Aug 24, 2021
2.9.1 Aug 24, 2021
2.9.0 Aug 23, 2021
2.8.3 May 20, 2021
2.8.2 May 19, 2021
2.8.1 Apr 27, 2021
2.8.0 Apr 24, 2021
2.7.1 Apr 16, 2021
2.7.0 Mar 25, 2021
2.6.1 Dec 02, 2020
2.5.0 Oct 06, 2020
2.4.0 Jan 11, 2020
2.3.2 May 30, 2019
2.3.1 Mar 25, 2019
2.3.0 Mar 22, 2019
2.2.0 Mar 07, 2018
2.2.0rc3 Jan 31, 2018
2.2.0rc2 Jan 30, 2018
2.2.0rc1 Jan 29, 2018
2.1.4 Jan 26, 2017
2.1.3 Sep 27, 2016
2.1.2 Sep 25, 2016
2.1.1 Sep 12, 2016
2.1.0 Sep 09, 2016
2.0.1 Sep 03, 2016
2.0.0 Sep 02, 2016
2.0.0a2 Aug 31, 2016
1.1.0 Jul 22, 2016
1.0.0 May 17, 2016
1.0.0b6 May 16, 2016
1.0.0b4 May 08, 2016
1.0.0b3 May 05, 2016
0.4.0 Apr 14, 2016
0.3.0 Dec 21, 2015
0.2.7 Dec 18, 2015
0.2.6 Dec 18, 2015
0.2.5 Dec 18, 2015
0.2.3 Dec 18, 2015
0.2.2 Dec 18, 2015
No dependencies