compare#

versus.compare( table_a: duckdb.DuckDBPyRelation | 'pandas.DataFrame' | 'polars.DataFrame', table_b: duckdb.DuckDBPyRelation | 'pandas.DataFrame' | 'polars.DataFrame', *, by: Sequence[str], allow_both_na: bool = True, coerce: bool = True, table_id: Tuple[str, str] = ('a', 'b'), con: duckdb.DuckDBPyConnection | None = None, materialize: Literal['all', 'summary', 'none'] = 'all', ) → Comparison#

Compare two DuckDB relations by key columns.

Parameters:

table_a, table_bDuckDBPyRelation, pandas.DataFrame, or polars.DataFrame: DuckDB relations or pandas/polars DataFrames to compare.
bysequence of str: Column names that uniquely identify rows.
allow_both_nabool, default True: Whether to treat NULL/NA values as equal when both sides are missing.
coercebool, default True: If True, allow DuckDB to coerce compatible types. If False, require exact type matches for shared columns.
table_idtuple[str, str], default (“a”, “b”): Labels used in outputs for the two tables.
conduckdb.DuckDBPyConnection, optional: DuckDB connection used to register the inputs and run queries.
materialize{“all”, “summary”, “none”}, default “all”: Controls which helper tables are materialized upfront.

Returns:

Comparison: Comparison object with summary relations and diff helpers.

Examples

>>> from versus import compare, examples
>>> comparison = compare(
...     examples.example_cars_a(),
...     examples.example_cars_b(),
...     by=["car"],
... )
>>> comparison.summary()
┌────────────────┬─────────┐
│   difference   │  found  │
│    varchar     │ boolean │
├────────────────┼─────────┤
│ value_diffs    │ true    │
│ unmatched_cols │ true    │
│ unmatched_rows │ true    │
│ type_diffs     │ false   │
└────────────────┴─────────┘