Benchmark#

These benchmarks show time spent and peak memory observed while running versus.compare() with the default materialize="all" setting.

The benchmark dataset starts from the Python nycflights13 weather table. Each benchmark size builds a deterministic sample with replacement from that table using a synthetic benchmark_id key. The left and right inputs each keep 95% of those keys, and 5% of the sampled keys that remain on both sides are mutated in 4 columns (temp, dewp, humid, wind_dir).

Synthetic benchmark sizes: 250k, 1M, 2M, 5M, 10M, 20M.

Each point is the median of 3 fresh-process runs with 30 seconds of cooldown between runs. Only the compare() call is timed; input generation happens before timing starts.

Benchmarks were run on MacBook Air with Apple M5 and 32 GB.

Methods:

  • versus.compare() on pandas DataFrames already held in memory.

  • versus.compare() on DuckDB relations backed by parquet files.

Hover a point to see the exact value.