compare#

versus.compare(
table_a: duckdb.DuckDBPyRelation | 'pandas.DataFrame' | 'polars.DataFrame',
table_b: duckdb.DuckDBPyRelation | 'pandas.DataFrame' | 'polars.DataFrame',
*,
by: Sequence[str],
allow_both_na: bool = True,
coerce: bool = True,
table_id: Tuple[str, str] = ('a', 'b'),
con: duckdb.DuckDBPyConnection | None = None,
materialize: Literal['all', 'summary', 'none'] = 'all',
) Comparison#

Compare two DuckDB relations by key columns.

Parameters:
table_a, table_bDuckDBPyRelation, pandas.DataFrame, or polars.DataFrame

DuckDB relations or pandas/polars DataFrames to compare.

bysequence of str

Column names that uniquely identify rows.

allow_both_nabool, default True

Whether to treat NULL/NA values as equal when both sides are missing.

coercebool, default True

If True, allow DuckDB to coerce compatible types. If False, require exact type matches for shared columns.

table_idtuple[str, str], default (“a”, “b”)

Labels used in outputs for the two tables.

conduckdb.DuckDBPyConnection, optional

DuckDB connection used to register the inputs and run queries.

materialize{“all”, “summary”, “none”}, default “all”

Controls which helper tables are materialized upfront.

Returns:
Comparison

Comparison object with summary relations and diff helpers.

Examples

>>> from versus import compare, examples
>>> comparison = compare(
...     examples.example_cars_a(),
...     examples.example_cars_b(),
...     by=["car"],
... )
>>> comparison.summary()
┌────────────────┬─────────┐
│   difference   │  found  │
│    varchar     │ boolean │
├────────────────┼─────────┤
│ value_diffs    │ true    │
│ unmatched_cols │ true    │
│ unmatched_rows │ true    │
│ type_diffs     │ false   │
└────────────────┴─────────┘