--- title: "Recoding Variables" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Recoding Variables} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) library(knitr) library(kableExtra) ``` ```{r setup, echo=FALSE} library(qacBase) df <- data.frame(sex=c(1,2,1,2,2,2), race=c("b", "w", "a", "b", "w", "h"), outcome=c("better", "worse", "same", "same", "better", "worse"), Q1=c(20, 30, 44, 15, 50, 99), Q2=c(15, 23, 18, 86, 99, 35), age=c(12, 20, 33, 55, 30, 100), rating =c(1,2,5,3,4,5)) ``` The `recodes()` functions makes it very easy to recode one or more variables in the your data frame. The format is

**`newdata <- recodes(olddata, variables, from values, to values)`** ### Original dataset Consider the following data set (below). Lets make the following changes. ```{r, echo=FALSE} kbl(df) %>% kable_styling(bootstrap_options = "striped", full_width = F, position = "left") ``` #### sex For `sex`, set 1 to "Male" and 2 to "Female". ```{r} df <- recodes(data=df, vars="sex", from=c(1,2), to=c("Male", "Female")) ``` ```{r, echo=FALSE} kbl(df) %>% kable_styling(bootstrap_options = "striped", full_width = F, position = "left") ``` #### race Recode `race` to "White" vs. "Other". ```{r} df <- recodes(data=df, vars="race", from=c("w", "b", "a", "h"), to=c("White", "Other", "Other", "Other")) ``` ```{r, echo=FALSE} kbl(df) %>% kable_styling(bootstrap_options = "striped", full_width = F, position = "left") ``` #### outcome Recode `outcome` to 1 (better) vs. 0 (not better). ```{r} df <- recodes(data=df, vars="outcome", from=c("better", "same", "worse"), to=c(1, 0, 0)) ``` ```{r, echo=FALSE} kbl(df) %>% kable_styling(bootstrap_options = "striped", full_width = F, position = "left") ``` #### Q1 and Q2 For `Q1` and `Q2` set values of 86 and 99 to missing. ```{r} df <- recodes(data=df, vars=c("Q1", "Q2"), from=c(86, 99), to=NA) ``` ```{r, echo=FALSE} kbl(df) %>% kable_styling(bootstrap_options = "striped", full_width = F, position = "left") ``` #### age For `age`, set values * less than 20 or greater than 90 to missing, * 20 <= age <= 30 to "Younger", * 30 < age <= 50 to "Middle Aged", and * 50 < age <= 90 to "Older". You can use expressions in your `from` fields. When they are `TRUE`, the corresponding `to` values will be applied. We will use the dollar sign (**\$**) to represent the variable (age in this case). The symbols ( **\|**, **\&** ) mean **OR** and **AND** respectively. ```{r} df <- recodes(data=df, vars="age", from=c("$ < 20 | $ > 90", "$ >= 20 & $ <= 30", "$ > 30 & $ <= 50", "$ > 50 & $ <= 90"), to=c(NA, "Younger", "Middle Aged", "Older")) ``` We can also write this as ```{r eval=FALSE} df <- recodes(data=df, vars="age", from=c("$ < 20", "$ <= 30", "$ <= 50", "$ <= 90", "$ > 90"), to= c(NA, "Younger", "Middle Aged", "Older", "NA")) ``` This works because once the age value for an observations meets a criteria that is `TRUE` (working left to right), it is recoded. It isn't changed again by later criteria in the same `recodes` statement. ```{r, echo=FALSE} kbl(df) %>% kable_styling(bootstrap_options = "striped", full_width = F, position = "left") ``` #### rating Finally, for the `rating` variable, reverse the scoring so that 1 to 5 becomes 5 to 1. ```{r} df <- recodes(data=df, vars="rating", from=1:5, to=5:1) ``` ```{r, echo=FALSE} kbl(df) %>% kable_styling(bootstrap_options = "striped", full_width = F, position = "left") ``` ### Note Remember that `recodes` returns a data frame, not a variable. * `df <- recodes(data=df, vars="rating", from=1:5, to=5:1)` **is correct**. * `df$rating <- recodes(data=df, vars="rating", from=1:5, to=5:1)` **is not**. This allows you to apply the same recoding scheme to more than one variable at a time (e.g., Q1 and Q2 above).