CompressionΒΆ

Optionally, a compression argument will compress the resulting bytes. These can take a bit more time to write. The available compressors are zlib and blosc.

Generally compression will increase the writing time.

In [1]: import pandas as pd

In [2]: from pandas_msgpack import to_msgpack, read_msgpack

In [3]: df = pd.DataFrame({'A': np.arange(100000),
   ...:                    'B': np.random.randn(100000),
   ...:                    'C': 'foo'})
   ...: 
In [4]: %timeit -n 1 -r 1 to_msgpack('uncompressed.msg', df)
1 loop, best of 1: 58 ms per loop
In [5]: %timeit -n 1 -r 1 to_msgpack('compressed_blosc.msg', df, compress='blosc')
1 loop, best of 1: 41.8 ms per loop
In [6]: %timeit -n 1 -r 1 to_msgpack('compressed_zlib.msg', df, compress='zlib')
1 loop, best of 1: 256 ms per loop

If compressed, it will be be automatically inferred and de-compressed upon reading.

In [7]: %timeit -n 1 -r 1 read_msgpack('uncompressed.msg')
1 loop, best of 1: 33.1 ms per loop
In [8]: %timeit -n 1 -r 1 read_msgpack('compressed_blosc.msg')
1 loop, best of 1: 35.6 ms per loop
In [9]: %timeit -n 1 -r 1 read_msgpack('compressed_zlib.msg')
1 loop, best of 1: 59.2 ms per loop

These can provide storage space savings.

In [10]: !ls -ltr *.msg
-rw-r--r-- 1 docs docs 2000582 Apr  1 16:03 uncompressed.msg
-rw-r--r-- 1 docs docs 1187977 Apr  1 16:03 compressed_blosc.msg
-rw-r--r-- 1 docs docs 1320519 Apr  1 16:03 compressed_zlib.msg