Benchmark#
These benchmarks show time spent and memory used by versus.compare() with the
default materialize="all" setting. The data comes from the Python
nycflights13 package (the weather table). Each size is
sampled with replacement from the original table and keeps 95% of rows on each
side. In 4 of the 15 columns (temp, dewp, humid, wind_dir), 5% of values
differ between the two inputs. For the parquet scan case, the sampled tables are
written to parquet before running compare().
Row sizes: 250k, 1M, 2M, 5M, 10M, 20M.
Benchmarks were run on a 2020 13-inch MacBook Pro (2.3 GHz quad-core Intel Core i7, 32 GB RAM).
Methods:
Pandas DataFrames:
versus.compare()on pandas DataFrames held in memory.DuckDB parquet scan:
versus.compare()on DuckDB relations backed by parquet files.
Hover a point to see the exact value.