--- title: "Getting started with the qacBase package" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Getting started with the qacBase package} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) library(qacBase) library(dplyr) library(knitr) library(kableExtra) ```
![](duckhead.jpg)
The **qacBase** package contains functions and data sets designed to simplify data analyses and aid in the instruction of data science courses. The primary functions are described below. ## Preparing Data ```{r, echo=FALSE} d <- data.frame(Function=c("[**recodes**](../reference/recodes.html)", "[**standardize**](../reference/standardize.html)", "[**normalize**](../reference/normalize.html)"), Description=c("`recodes()` provides a simple way to recode the values of numeric, character, or factor variables. See the [vignette](recodes.html) for examples.", "`standardize()` transforms all the numeric variables in a data frame to same mean and standard deviation (mean=0 sd=1, by default), without modifying character, factor, or dummy coded variables.", "`normalize()` transforms all the numeric variables in a data frame to same range of values ([0, 1] by default). Again, character and factor variables are left unchanged.")) kbl(d) %>% kable_styling(bootstrap_options = "striped", full_width = F, position = "left") ``` ## Describing a data set ```{r, echo=FALSE} d <- data.frame(Function=c("[**contents**](../reference/contents.html)", "[**df_plot**](../reference/df_plot.html)", "[**barcharts**](../reference/barcharts.html)", "[**histograms**](../reference/histograms.html)", "[**densities**](../reference/densities.html)" ), Description=c("`contents()` provides a comprehensive description of a data frame. The output is **much** more detailed than that provided by the base `summary.data.frame()` function, and is easier to read and understand. This function should be your first stop when looking at a new dataset.", "`df_plot()` helps you visual a data frame. Variable are grouped by type (numeric, integer, character, factor, date) and color coded. The percent of missing data for each variable is also displayed, along with the total number of variables and cases.", "`barcharts()` provides bar charts of all the character or factor variables in a data frame, within a single graph.", "`histograms()` provides histograms of the all quantitative variables in a data frame, within a single graph.", "`densities()` provides density charts of all the quantitative variables in a data frame, within a single graph." )) kbl(d) %>% kable_styling(bootstrap_options = "striped", full_width = F, position = "left") ``` ## Exploratory data analysis ### Numeric variables ```{r, echo=FALSE} d <- data.frame( Function=c( "[**qstats**](../reference/qstats.html)", "[**univariate_plot**](../reference/univariate_plot.html)", "[**scatter**](../reference/scatter.html)", "[**cor_plot**](../reference/cor_plot.html)", "[**groupdiff**](../reference/groupdiff.html)", "[**mean_plot**](../reference/mean_plot.html)"), Description=c( "`qstats()` allows you to easily calculate any number of descriptive statistics (e.g., n, mean, sd) for a quantiatative variable. The results can be broken down by the levels of one of more categorical variables (groups). Any function that produces a single number can be used. See the [vignette](qstats.html) for examples.", "`univariate_plot()` provides a detailed visualization of the distribution of values in a quantiative variable. The graph contains a histrogram, jittered dot plot, density curve, and boxplot, Annotations provide statistics such as *n*, *mean*, *sd*, *median*, *min*, *max*, *skew*, and *outliers*.", "`scatter()` generates a scatter plot and line of best fit with 95% confidence interval displaying the relationship between two quantiative variables. Annotations include the slope, correlation coefficient (r), r-squared, and p_value. Oultiers (determinded by studentized residuals) are flagged. Optionally, marginal distributions (histograms, boxplots, density curves, violin plots) can be added to the margins of the plot.", "`corplot()` plots the correlations among numeric variables in a data frame. Variables can be sorted to place variables with similar correlation patterns together.", "`groupdiff()` compares groups on a quantitative outcome using either a parametric (ANOVA) or nonparametric (Kruskal-Wallis) test. Summary statistics, pair-wise group differences (post-hoc comparisons), and plots are provided.", "`mean_plot()` plots means and error bars for each level of a categorical variable. Interaction with a second categorical variable can also be added. Error ranges can represent standard deviations, standard errors, or confidence intervals. Each can be based on standard or robust statistics." )) kbl(d) %>% kable_styling(bootstrap_options = "striped", full_width = F, position = "left") ``` ### Categorical Variables ```{r, echo=FALSE} d <- data.frame(Function=c("[**tab**](../reference/tab.html)", "[**crosstab**](../reference/crosstab.html)" ), Description=c("`tab()` generates a frequency table and bar chart for a categorical variable. There are many options including sorting categories by frequency, adding cumulative frequencies and percents, and combining infrequent categories into an 'Other' category. See the [vignette](tab.html) for examples.", "`crosstab()` generates a two-way frequency table from two categorical variables. There are many options including cell, row, and column percents, plotting options, and a chi-square test of independence. See the [vignette](crosstab.html) for examples." )) kbl(d) %>% kable_styling(bootstrap_options = "striped", full_width = F, position = "left") ```