compare#
- versus.compare(
- table_a: duckdb.DuckDBPyRelation | 'pandas.DataFrame' | 'polars.DataFrame',
- table_b: duckdb.DuckDBPyRelation | 'pandas.DataFrame' | 'polars.DataFrame',
- *,
- by: Sequence[str],
- allow_both_na: bool = True,
- coerce: bool = True,
- table_id: Tuple[str, str] = ('a', 'b'),
- con: duckdb.DuckDBPyConnection | None = None,
- materialize: Literal['all', 'summary', 'none'] = 'all',
Compare two DuckDB relations by key columns.
- Parameters:
- table_a, table_bDuckDBPyRelation, pandas.DataFrame, or polars.DataFrame
DuckDB relations or pandas/polars DataFrames to compare.
- bysequence of str
Column names that uniquely identify rows.
- allow_both_nabool, default True
Whether to treat NULL/NA values as equal when both sides are missing.
- coercebool, default True
If True, allow DuckDB to coerce compatible types. If False, require exact type matches for shared columns.
- table_idtuple[str, str], default (“a”, “b”)
Labels used in outputs for the two tables.
- conduckdb.DuckDBPyConnection, optional
DuckDB connection used to register the inputs and run queries.
- materialize{“all”, “summary”, “none”}, default “all”
Controls which helper tables are materialized upfront.
- Returns:
- Comparison
Comparison object with summary relations and diff helpers.
Examples
>>> from versus import compare, examples >>> comparison = compare( ... examples.example_cars_a(), ... examples.example_cars_b(), ... by=["car"], ... ) >>> comparison.summary() ┌────────────────┬─────────┐ │ difference │ found │ │ varchar │ boolean │ ├────────────────┼─────────┤ │ value_diffs │ true │ │ unmatched_cols │ true │ │ unmatched_rows │ true │ │ type_diffs │ false │ └────────────────┴─────────┘