accessing many dataframes on the same row and column

Can you convert the data files to binary format? In one simple test:

  • np.loadtxt() imported a 900×900 array in 0.264 sec (text format)
  • np.load() imported the binary version in 0.0012 sec (binary format)
  • 217x speedup

Parquet and feather (in pandas docs ) are other high-performance storage options, and dask may be helpful for managing computations.

import numpy as np
from pathlib import Path
from time import perf_counter

data_dir = Path('../../../Downloads/RW-201912') 
text_file = 'RW_20191231-2350.asc'
bin_file = 'test.npy'

# 1. read text file
start = perf_counter()
with open(data_dir / text_file, 'rt') as handle:
    x = np.loadtxt(handle, skiprows=6)
elapsed = perf_counter() - start
print(x.shape, round(elapsed, 4))

# 2. write binary file
start = perf_counter()
with open(data_dir / bin_file, 'wb') as fp:
    np.save(fp, x.astype(np.int8))
elapsed = perf_counter() - start
print(round(elapsed, 4))

# 3. read binary file
start = perf_counter()
with open(data_dir / bin_file, 'rb') as handle:
    y = np.load(handle)
elapsed = perf_counter() - start
print(y.shape, round(elapsed, 4))

(900, 900) 0.2661 # read text file
0.0026            # write binary file
(900, 900) 0.0009 # read binary file

CLICK HERE to find out more related problems solutions.

Leave a Comment

Your email address will not be published.

Scroll to Top