Check for duplicate rows — count_dupes • tablecompare

count_dupes() returns values of by variables for which the .data has multiple rows, along with the number of rows for each combination of values.

assert_unique() throws an error if there are multiple rows for any combination of by variable values

Usage

count_dupes(.data, by, setkey = FALSE)

assert_unique(.data, by, data_chr, by_chr)

Arguments

.data: A data frame or data table
by: tidy-select. Columns in .data
setkey: Logical. Should the output be keyed by by cols?
data_chr: optional. character. You can use this argument to manually specify the name of data shown in error messages. Useful when using these functions as checks inside other functions.
by_chr: optional. character. You can use this argument to manually specify the name of by shown in error messages. Useful when using these functions as checks inside other functions.

Value

count_dupes(): A data.table with the (filtered) by columns and an additional column "n_rows" which shows the number of rows in .data having the combination of by values shown in the output row.
assert_unique(): No return value. Called to throw an error depending on the input.

Examples

df <- read.table(text = "
x y z
1 6 1
2 6 2
3 7 3
3 7 4
4 3 5
4 3 6
", header = TRUE)

count_dupes(df, c(x, y))
#>    x y n_rows
#> 1: 3 7      2
#> 2: 4 3      2

if (FALSE) {
assert_unique(df, c(x, y))
}