This page describes the <tidy-select> argument modifier which
indicates that the argument uses tidy selection, a sub-type of
tidy evaluation. If you've never heard of tidy evaluation before,
start with the practical introduction in
https://r4ds.hadley.nz/functions.html#data-frame-functions then
then read more about the underlying theory in
https://rlang.r-lib.org/reference/topic-data-mask.html.
Overview of selection features
tidyselect implements a DSL for selecting variables. It provides helpers for selecting variables:
- var1:var10: variables lying between- var1on the left and- var10on the right.
- starts_with("a"): names that start with- "a".
- ends_with("z"): names that end with- "z".
- contains("b"): names that contain- "b".
- matches("x.y"): names that match regular expression- x.y.
- num_range(x, 1:4): names following the pattern,- x1,- x2, ...,- x4.
- all_of(vars)/- any_of(vars): matches names stored in the character vector- vars.- all_of(vars)will error if the variables aren't present;- any_of(var)will match just the variables that exist.
- everything(): all variables.
- last_col(): furthest column on the right.
- where(is.numeric): all variables where- is.numeric()returns- TRUE.
As well as operators for combining those selections:
- !selection: only variables that don't match- selection.
- selection1 & selection2: only variables included in both- selection1and- selection2.
- selection1 | selection2: all variables that match either- selection1or- selection2.
Key techniques
- If you want the user to supply a tidyselect specification in a function argument, you need to tunnel the selection through the function argument. This is done by embracing the function argument - {{ }}, e.g- unnest(df, {{ vars }}).
- If you have a character vector of column names, use - all_of()or- any_of(), depending on whether or not you want unknown variable names to cause an error, e.g- unnest(df, all_of(vars)),- unnest(df, !any_of(vars)).
- To suppress - R CMD check- NOTEs about unknown variables use- "var"instead of- var:
# has NOTE
df %>% select(x, y, z)
# no NOTE
df %>% select("x", "y", "z")