xarray-ms 0.2.1


pip install xarray-ms==0.2.1

Project Links

Meta
Author: Simon Perkins
Requires Python: >=3.10,<4.0

Classifiers

Programming Language
  • Python :: 3
  • Python :: 3.10
  • Python :: 3.11
  • Python :: 3.12
https://img.shields.io/pypi/v/xarray-ms.svg https://github.com/ratt-ru/xarray-ms/actions/workflows/ci.yml/badge.svg Documentation Status

xarray-ms presents a Measurement Set v4 view (MSv4) over CASA Measurement Sets (MSv2). It provides access to MSv2 data via the xarray API, allowing MSv4 compliant applications to be developed on well-understood MSv2 data.

>>> import xarray_ms
>>> from xarray.backends.api import datatree
>>> dt = open_datatree("/data/L795830_SB001_uv.MS/",
                       chunks={"time": 2000, "baseline": 1000})
>>> dt
<xarray.DataTree>
Group: /
└── Group: /DATA_DESC_ID=0,FIELD_ID=0,OBSERVATION_ID=0
       Dimensions:                     (time: 28760, baseline: 2775, frequency: 16,
                                        polarization: 4, uvw_label: 3)
       Coordinates:
           antenna1_name               (baseline) object 22kB ...
           antenna2_name               (baseline) object 22kB ...
           baseline_id                 (baseline) int64 22kB ...
         * frequency                   (frequency) float64 128B 1.202e+08 ... 1.204e+08
         * polarization                (polarization) <U2 32B 'XX' 'XY' 'YX' 'YY'
         * time                        (time) float64 230kB 1.601e+09 ... 1.601e+09
       Dimensions without coordinates: baseline, uvw_label
       Data variables:
           EFFECTIVE_INTEGRATION_TIME  (time, baseline) float64 638MB ...
           FLAG                        (time, baseline, frequency, polarization) uint8 5GB ...
           TIME_CENTROID               (time, baseline) float64 638MB ...
           UVW                         (time, baseline, uvw_label) float64 2GB ...
           VISIBILITY                  (time, baseline, frequency, polarization) complex64 41GB ...
           WEIGHT                      (time, baseline, frequency, polarization) float32 20GB ...
       Attributes:
           version:              0.0.1
           creation_date:        2024-09-18T10:49:55.133908+00:00
           data_description_id:  0
    └── Group: /DATA_DESC_ID=0,FIELD_ID=0,OBSERVATION_ID=0/ANTENNA
            Dimensions:                 (antenna_name: 74,
                                         cartesian_pos_label/ellipsoid_pos_label: 3)
            Coordinates:
                baseline_antenna1_name  (baseline) object 22kB ...
                baseline_antenna2_name  (baseline) object 22kB ...
                baseline_id             (baseline) int64 22kB ...
              * frequency               (frequency) float64 128B 1.202e+08 1.202e+08 ... 1.204e+08
              * polarization            (polarization) <U2 32B 'XX' 'XY' 'YX' 'YY'
              * time                    (time) float64 230kB 1.601e+09 1.601e+09 ... 1.601e+09
              * antenna_name            (antenna_name) object 592B 'CS001HBA0' ... 'IE613HBA'
                mount                   (antenna_name) object 592B 'X-Y' 'X-Y' ... 'X-Y' 'X-Y'
                station                 (antenna_name) object 592B 'LOFAR' 'LOFAR' ... 'LOFAR'
            Dimensions without coordinates: cartesian_pos_label/ellipsoid_pos_label
            Data variables:
                ANTENNA_POSITION        (antenna_name, cartesian_pos_label/ellipsoid_pos_label) float64 2kB ...

Measurement Set v4

NRAO/SKAO are developing a new xarray-based Measurement Set v4 specification. While there are many changes some of the major highlights are:

  • xarray is used to define the specification.

  • MSv4 data consists of Datasets of ndarrays on a regular time-channel grid. MSv2 data is tabular and, while in many instances the time-channel grid is regular, this is not guaranteed, especially after MSv2 datasets have been transformed by various tasks.

xarray Datasets are self-describing and they are therefore easier to reason about and work with. Additionally, the regularity of data will make writing MSv4-based software less complex.

xradio

casangi/xradio provides a reference implementation that converts CASA v2 Measurement Sets to Zarr v4 Measurement Sets using the python-casacore package.

Why xarray-ms?

  • By developing against an MSv4 xarray view over MSv2 data, developers can develop applications on well-understood data, and then seamlessly transition to newer formats. Data can also be exported to newer formats (principally zarr) via xarray’s native I/O routines. However, the xarray view of either format looks the same to the software developer.

  • xarray-ms builds on xarray’s backend API: Implementing a formal CASA MSv2 backend has a number of benefits:

    • xarray’s internal I/O routines such as open_dataset and open_datatree can dispatch to the backend to load data.

    • Similarly xarray’s lazy loading mechanism dispatches through the backend.

    • Automatic access to any chunked array types supported by xarray including, but not limited to dask.

    • Arbitrary chunking along any xarray dimension.

  • xarray-ms uses arcae, a high-performance backend to CASA Tables implementing a subset of python-casacore’s interface.

  • Some limited support for irregular MSv2 data via padding.

Work in Progress

The Measurement Set v4 specification is currently under active development. xarray-ms is also currently under active development and does not yet have feature parity with MSv4 or xradio. Most measures information and many secondary sub-tables are currently missing.

However, the most important parts of the MSv2 MAIN tables, as well as the ANTENNA, POLARIZATON and SPECTRAL_WINDOW sub-tables are implemented and should be sufficient for basic algorithm development.

Wheel compatibility matrix

Platform Python 3
any

Files in release

Extras:
Dependencies:
arcae (<0.3.0,>=0.2.5)
cacheout (<0.17.0,>=0.16.0)
typing-extensions (<5.0.0,>=4.12.2)
xarray (<2025.0.0,>=2024.3.0)