--- title: "Descriptive statistics by group" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Descriptive statistics by group} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE, warning=FALSE, message=FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) library(qacBase) ``` ## Overview Getting summary statistics for a quantitative variable is a very common task in data analysis. Unfortunately, **R** makes it surprisingly difficult. The `qstats` function is an attempt to rectify the situation by making it simple to get any number of descriptive statistics for a numeric variable and to break these statistics down by the levels of one or more categorical variables (groups). The general format is **`qstats(data, variable, grouping variables, statistics, other options)`** Note that variable names do not have to be quoted. ## Using default statistics By default the *sample size*, *mean*, and *standard deviation* are provided. Let's take a look at fuel efficiencies for 11,914 automobiles in the `cardata` data frame. ```{r include=TRUE} # simple summary statistics qstats(cardata, highway_mpg) # summary statistics by vehicle_size qstats(cardata, highway_mpg, vehicle_size) # summary statistics by vehicle_size and drive type qstats(cardata, highway_mpg, vehicle_size, driven_wheels) ``` ## Specifying other statistics You can supply a statistics argument with the "stats" parameter. You can pass a single statistic, or multiple statistics as a vector of names. ```{r include=TRUE} # single statistic qstats(cardata, highway_mpg, vehicle_size, stats = "median") # multiple statistics qstats(cardata, highway_mpg, vehicle_size, stats = c("median", "min", "max")) ``` User-defined functions can also be used as a statistics. The only requirement is that the function returns a single number. ```{r include=TRUE} #custom statistics p25 <- function(x) quantile(x, probs=.25) p75 <- function(x) quantile(x, probs=.75) #calling the built in and custom statistics qstats(cardata, highway_mpg, vehicle_size, stats = c("min", "p25", "p75", "max")) ``` ## Other options Other options include * **na.rm** When TRUE, NAs are removed. Default is TRUE. * **digits** The number of decimal points to print. Default = 2. ```{r include=TRUE} qstats(cardata, highway_mpg, vehicle_size, stats=c("n", "mean","median","sd"), na.rm=FALSE, digits=2) ```