Compression and codecs¶
A Zarr v3 codec chain has three stages: optional array→array codecs
(transpose), exactly one array→bytes serializer (bytes, or vlen-* for
variable-length types — inserted automatically), and any number of
bytes→bytes codecs (compressors and checksums).
Pass codec objects to zarr.create; order within a stage is preserved:
store = zarr.stores.MemoryStore();
z = zarr.create(store, [100 100], "double", ChunkShape=[50 50], Path="a", ...
Codecs={zarr.codecs.ZstdCodec(5), zarr.codecs.Crc32cCodec()});
z(:, :) = randn(100);
assert(isequal(size(z(:, :)), [100 100]))
Available codecs¶
| Codec | Constructor | Notes |
|---|---|---|
| zstd | zarr.codecs.ZstdCodec(level, checksum) |
levels −131072…22, default 0; zarr-python's default compressor |
| blosc | zarr.codecs.BloscCodec(cname=…, clevel=…, shuffle=…) |
cname: lz4, lz4hc, blosclz, zstd, zlib; shuffle: noshuffle/shuffle/bitshuffle; typesize auto-filled from the dtype |
| gzip | zarr.codecs.GzipCodec(level) |
levels 0–9, default 5; pure Java, no MEX needed |
| crc32c | zarr.codecs.Crc32cCodec() |
4-byte checksum, verified on read |
| transpose | zarr.codecs.TransposeCodec(order) |
0-based dimension permutation |
| bytes | zarr.codecs.BytesCodec(endian) |
serializer; "little" (default) or "big" |
zb = zarr.create(store, [64 64], "single", ChunkShape=[32 32], Path="b", ...
Codecs={zarr.codecs.BloscCodec(cname="lz4", clevel=9, shuffle="shuffle")});
p = single(peaks(64));
zb(:, :) = p;
assert(isequal(zb(:, :), p))
MEX codecs¶
zstd and blosc (and a fast crc32c) are implemented as small MEX
binaries. Toolbox installs include them prebuilt for Linux, Windows, and
Apple Silicon; source installs build them once:
>> run tools/build_mex.m % needs a C compiler + libzstd / libblosc
Everything else — including gzip — works without them; opening data that
needs a missing MEX raises zarr:MissingMex naming the codec. Note that
zarr-python's default compressor is zstd, so most Python-written v3 data
needs the MEX.
Column-major storage (Order="F")¶
Zarr chunks are C-order (row-major) by default; MATLAB memory is
column-major, so encoding/decoding involves a permute. Passing Order="F"
inserts a spec-standard transpose codec so chunks are stored column-major —
copy-free for MATLAB, still perfectly readable by zarr-python:
zf = zarr.create(store, [40 60], "double", ChunkShape=[20 30], ...
Path="forder", Order="F");
zf(:, :) = rand(40, 60);
assert(isequal(size(zf(1:10, 1:10)), [10 10]))
Performance¶
Measured on an M1 Mac (R2024b), 200 MB float64, 500×500 chunks
(tools/bench.m):
| config | write MB/s | read MB/s | stored MB |
|---|---|---|---|
| raw (bytes only) | 329 | 1237 | 200.0 |
| gzip-1 (Java) | 40 | 112 | 179.0 |
| zstd-3 (MEX) | 302 | 593 | 183.5 |
| blosc zstd-3 shuffle (MEX) | 315 | 889 | 151.4 |
| zstd-3 sharded | 395 | 557 | 183.5 |
Recommendation: blosc(zstd) with shuffle for numeric data — best compression and fastest decompression. Reserve gzip for environments where the MEX binaries cannot be used.