Package 'qacBase' reference manual

Title:	Functions to Facilitate Exploratory Data Analysis
Description:	Functions for descriptive statistics, data management, and data visualization.
Authors:	Kabacoff Robert [aut, cre], Barich Griffen [ctb], Jamrog Kelly [ctb], Kravchenko Elizaveta [ctb], Kuruvilla Jacob [ctb], Liu Lex [ctb], Nakamura Shota [ctb], Pham Kim [ctb], Rodriguez Belen [ctb], Ross Shane [ctb], Russo Chris [ctb], Corpuz Frederick [ctb], Juradat Nurah [ctb], Karp Harrison [ctb], Koech Kevin [ctb], Peters Anna [ctb], Shah Dhhyey [ctb], Stevenson Kenneth [ctb], Thomas-Franz Kaitlyn [ctb], Zheng Jiner [ctb], Aldarmaki Ahmed [ctb], Alneyadi Mohammed [ctb], Altai Chossis [ctb], Colorado Sofia [ctb], Northrop Blake [ctb], Peretz Shea [ctb], Qin Cher [ctb], Tuhabonye Emma [ctb], Wong Phillip [ctb]
Maintainer:	Kabacoff Robert <[email protected]>
License:	MIT + file LICENSE
Version:	1.0.3
Built:	2025-03-20 04:16:36 UTC
Source:	https://github.com/rkabacoff/qacbase

Barcharts

Description

Create barcharts for all categorical variables in a data frame.

Usage

barcharts(
  data,
  fill = "deepskyblue2",
  color = "grey30",
  labels = TRUE,
  sort = TRUE,
  maxcat = 20,
  abbrev = 20
)
barcharts(
  data,
  fill = "deepskyblue2",
  color = "grey30",
  labels = TRUE,
  sort = TRUE,
  maxcat = 20,
  abbrev = 20
)

Arguments

`data`	data frame
`fill`	fill color for bars
`color`	color for bar labels
`labels`	if `TRUE`, bars are labeled with percents
`sort`	if `TRUE`, bars are sorted by frequency
`maxcat`	numeric. barcharts with more than this number of bars will not be plotted.
`abbrev`	numeric. abbreviate bar labels to at most, this character length.

Value

a ggplot graph

Examples

barcharts(cars74)
barcharts(cars74)

Automobile characteristics

Description

Cars dataset with features including make, model, year, engine, and other properties of the car used to predict its price.

Usage

cardata
cardata

Format

A data frame with 11914 rows and 16 variables. The variables are as follows:

make: car brand
model: model given by its brand
year: year of manufacture
engine_fuel_type: type of fuel required by its manufacturer
engine_hp: engine horse power
engine_cylinders: number of cylinders
transmission_type: automatic vs. manual
driven_wheels: AWD, FWD, AWD
number_of_doors: Number of Doors
market_category: Luxury, Performance, Hatchback, etc.
vehicle_size: Compact, Midsize, Large
vehicle_style: Type of Vehicle: Sedan, SUV, Coupe, etc.
highway_mpg: highway miles per gallon
city_mpg: city miles per gallon
popularity: Popularity index
msrp: manufacturer's suggested retail price

Details

This package contains a detailed car dataset.

Source

Taken from Kaggle https://www.kaggle.com/CooperUnion/cardataset.

Examples

summary(cardata)

summary(cardata)

Motor Trend car road tests

Description

The data was extracted from the 1974 Motor Trend US magazine, and comprises fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973-74 models).

Usage

cars74
cars74

Format

A data frame with 32 rows and 11 variables. The variables are as follows:

auto: highway miles per gallon
mpg: Miles/(US) gallon
cyl: Number of cylinders
disp: Displacement (cu.in.)
hp: Gross horsepower
drat: Rear axle ratio
wt: Weight (1000 lbs)
qsec: 1/4 mile time
vs: Engine cylinder configuration
am: Transmission type
gear: Number of forward gears
carb: Number of carburetors

Details

This dataset is the mtcars dataset that comes with base R. However, cyl, vs, am, gear and carb have been converted to factors and rownames have been converted to the variable auto. A description of the variables by Soren Heitmann can be found here.

Source

Henderson and Velleman (1981), Building multiple regression models interactively. Biometrics, 37, 391-411.

Examples

summary(cars74)
summary(cars74)

Detailed description of a data frame

Description

contents provides a comprehensive description of a data frame, including summary statistics for both quantitative and categorical variables

Usage

contents(data, digits = 2, maxcat = 10, label_length = 20)
contents(data, digits = 2, maxcat = 10, label_length = 20)

Arguments

`data`	a data frame
`digits`	number of decimal digits for statistics.
`maxcat`	maximum number of levels of a character/factor variable to print.
`label_length`	maximum length of factor level label to print. Longer labels will be truncated.

Details

Prints a comprehensive description of a data frame via several tables, a general summary table and tables that provide a breakdown of quantitative and categorical variables.

Value

a list with 6 components:

dfname: name of data frame
nrow: number of rows
ncol: number of columns
overall: data frame of overall dataset characteristics
qvars: data frame with summary statistics for quantitative variables
cvars: data frame with summary statistics for categorical variables

Examples

contents(cars74)

contents(cars74)

Correlation matrix plot

Description

Create a correlation matrix for all quantitative variables in a data frame.

Usage

cor_plot(
  data,
  method = c("pearson", "kendall", "spearman"),
  sort = FALSE,
  axis_text_size = 12,
  number_text_size = 3,
  legend = FALSE
)
cor_plot(
  data,
  method = c("pearson", "kendall", "spearman"),
  sort = FALSE,
  axis_text_size = 12,
  number_text_size = 3,
  legend = FALSE
)

Arguments

`data`	data frame
`method`	a character string indicating which correlation coefficient is to be computed. One of "pearson" (default), "kendall", or "spearman".
`sort`	logical. If `TRUE`, reorder variables to place variables with similar correlation patterns together.
`axis_text_size`	size for axis labels (default=12).
`number_text_size`	size for correlation coefficient labels (default=3).
`legend`	logical, if TRUE the legend is displayed. (default=FALSE)

Details

The cor_plot function will only select quantitative variables from a data frame. Categorical variables are ignored. The correlation matrix is presented as a lower triangle matrix. Missing values are deleted in listwise fashion.

Value

a ggplot graph

Note

This function is a wrapper for the ggcorrplot function.

Examples

cor_plot(cars74)
cor_plot(cars74, sort=TRUE)
cor_plot(cars74)
cor_plot(cars74, sort=TRUE)

Two-way frequency table

Description

This function creates a two way frequency table.

Usage

crosstab(
  data,
  rowvar,
  colvar,
  type = c("freq", "percent", "rowpercent", "colpercent"),
  total = TRUE,
  na.rm = TRUE,
  digits = 2,
  chisquare = FALSE,
  plot = FALSE
)
crosstab(
  data,
  rowvar,
  colvar,
  type = c("freq", "percent", "rowpercent", "colpercent"),
  total = TRUE,
  na.rm = TRUE,
  digits = 2,
  chisquare = FALSE,
  plot = FALSE
)

Arguments

`data`	data frame
`rowvar`	row factor (unquoted)
`colvar`	column factor (unquoted)
`type`	statistics to print. Options are `"freq"`, `"percent"`, `"rowpercent"`, or `"colpercent"` for frequencies, cell percents, row percents, or column percents).
`total`	logical. if TRUE, includes total percents.
`na.rm`	logical. if TRUE, deletes cases with missing values.
`digits`	number of decimal digits to report for percents.
`chisquare`	logical. If `TRUE` perform a chi-square test of independence
`plot`	logical. If `TRUE` generate stacked bar chart.

Details

Given a data frame, a row factor, a column factor, and a type (frequencies, cell percents, row percents, or column percents) the function provides the requested cross-tabulation.

If na.rm = FALSE, a level labeled <NA> added. If total = TRUE, a level labeled Total is added. If chisquare = TRUE, a chi-square test of independence is performed.

Value

If plot=TRUE, return a ggplot2 graph. Otherwise the function return a list with 6 components:

table (table). Table of frequencies or percents
type (character). Type of table to print
total (logical). If TRUE, print row and or column totals
digits (numeric). number of digits to print
rowname (character). Row variable name
colname (character). Column variable name
chisquare (character). If chisquare=TRUE, contains the results of the Chi-square test. NULL otherwise.

Examples

# print frequencies
crosstab(mtcars, cyl, gear)

# print cell percents
crosstab(cardata, vehicle_size, driven_wheels)
crosstab(cardata, vehicle_size, driven_wheels,
plot=TRUE)
crosstab(cardata, driven_wheels, vehicle_size,
type="colpercent", plot=TRUE, chisquare=TRUE)
# print frequencies
crosstab(mtcars, cyl, gear)

# print cell percents
crosstab(cardata, vehicle_size, driven_wheels)
crosstab(cardata, vehicle_size, driven_wheels,
plot=TRUE)
crosstab(cardata, driven_wheels, vehicle_size,
type="colpercent", plot=TRUE, chisquare=TRUE)

Density plots

Description

Create desnsity plots for all quantitative variables in a data frame.

Usage

densities(data, fill = "deepskyblue2", adjust = 1)
densities(data, fill = "deepskyblue2", adjust = 1)

Arguments

`data`	data frame
`fill`	fill color for density plots
`adjust`	a factor multiplied by the smoothing bandwidth. See details.

Details

The densities function will only plot quantitative variables from a data frame. Categorical variables are ignored.

The adjust parameter mulitplies the smoothing parameter. For example adjust = 2 will make the density plots twice as smooth. The adjust = 1/2 will make the density plots half as smooth (i.e., twice as spiky).

Value

a ggplot graph

Examples

densities(cars74)

densities(cars74, adjust=2)

densities(cars74, adjust=1/2)
densities(cars74)

densities(cars74, adjust=2)

densities(cars74, adjust=1/2)

Visualize a data frame

Description

df_plot visualizes the variables in a data frame.

Usage

df_plot(data)
df_plot(data)

Arguments

data

a data frame.

Details

For each variable, the plot displays

type (numeric, integer, factor, ordered factor, logical, or date)
percent of available (and missing) cases

Variables are sorted by type and the total number of variables and cases are printed in the caption.

Value

a ggplot2 graph

Examples

df_plot(cars74)
df_plot(cars74)

Test of group differences

Description

One-way analysis (ANOVA or Kruskal-Wallis Test) with post-hoc comparisons and plots

Usage

groupdiff(
  data,
  y,
  x,
  method = c("anova", "kw"),
  digits = 2,
  horizontal = FALSE,
  posthoc = FALSE
)
groupdiff(
  data,
  y,
  x,
  method = c("anova", "kw"),
  digits = 2,
  horizontal = FALSE,
  posthoc = FALSE
)

Arguments

`data`	a data frame.
`y`	a numeric response variable
`x`	a categorical explanatory variable. It will coerced to be a factor.
`method`	character. Either `"anova"`, or `"kw"` (see details).
`digits`	Number of significant digits to print.
`horizontal`	logical. If `TRUE`, boxplots are plotted horizontally.
`posthoc`	logical. If `TRUE`, the default, perform pairwise post-hoc comparisons (TukeyHSD for ANOVA and Conover Test for Kuskal Wallis). This test will only be performed if there are 3 or more levels for X.

Details

The groupdiff function performs one of two analyses:

anova: A one-way analysis of variance, with TukeyHSD post-hoc comparisons.
kw: A Kruskal Wallis Rank Sum Test, with Conover Test post-hoc comparisons.

In each case, summary statistics and a grouped boxplots are provided. In the parametric case, the statistics are n, mean, and standard deviation. In the nonparametric case the statistics are n, median, and median absolute deviation. If posthoc = TRUE, pairwise comparisons of superimposed on the boxplots. Groups that share a letter are not significantly different (p < .05), controlling for multiple comparisons.

Value

a list with 3 components:

result: omnibus test
summarystats: summary statistics
plot: ggplot2 graph

Examples

# parametric analysis
groupdiff(cars74, hp, gear)

# nonparametric analysis
groupdiff(cardata, popularity, vehicle_style, posthoc=TRUE,
          method="kw", horizontal=TRUE)
# parametric analysis
groupdiff(cars74, hp, gear)

# nonparametric analysis
groupdiff(cardata, popularity, vehicle_style, posthoc=TRUE,
          method="kw", horizontal=TRUE)

Histograms

Description

Create histograms for all quantitative variables in a data frame.

Usage

histograms(data, fill = "deepskyblue2", color = "white", bins = 30)
histograms(data, fill = "deepskyblue2", color = "white", bins = 30)

Arguments

`data`	data frame
`fill`	fill color for histogram bars
`color`	border color for histogram bars
`bins`	number of bins (bars) for the histograms

Details

The histograms function will only plot quantitative variables from a data frame. Categorical variables are ignored.

Value

a ggplot graph

Examples

histograms(cars74)
histograms(cars74, bins=15, fill="darkred")
histograms(cars74)
histograms(cars74, bins=15, fill="darkred")

List object sizes and types

Description

lso lists object sizes and types.

Usage

lso(
  pos = 1,
  pattern,
  order.by = "Size",
  decreasing = TRUE,
  head = TRUE,
  n = 10
)
lso(
  pos = 1,
  pattern,
  order.by = "Size",
  decreasing = TRUE,
  head = TRUE,
  n = 10
)

Arguments

`pos`	a number specifying the environment as a position in the search list.
`pattern`	an optional regular expression. Only names matching pattern are returned. glob2rx can be used to convert wildcard patterns to regular expressions.
`order.by`	column to sort the list by. Values are `"Type"`, `"Size"`, `"Rows"`, and `"Columns"`.
`decreasing`	logical. If `FALSE`, the list is sorted in ascending order.
`head`	logical. Should output be limited to `n` lines?
`n`	if `head=TRUE`, number of rows should be displayed?

Details

This function list the sizes and types of all objects in an environment. By default, the list describes the objects in the current environment, presented in descending order by object size and reported in megabytes (Mb).

Value

a data.frame with four columns (Type, Size, Rows, Columns) and object names as row names.

Author(s)

Based on based on postings by Petr Pikal and David Hinds to the r-help list in 2004 and modified Dirk Eddelbuettel, Patrick McCann, and Rob Kabacoff.

References

https://stackoverflow.com/questions/1358003/tricks-to-manage-the-available-memory-in-an-r-session/.

Examples

data(cardata)
data(cars74)
lso()
data(cardata)
data(cars74)
lso()

Mean plot with error bars

Description

Plots group means with error bars. Error bars can be standard deviations, standard errors, or confidence intervals. Optionally, plots can be based on robust statistics.

Usage

mean_plot(
  data,
  y,
  x,
  by,
  pointsize = 2,
  dodge = 0.2,
  lines = TRUE,
  width = 0.2,
  error_type = c("se", "sd", "ci"),
  percent = 0.95,
  robust = FALSE
)
mean_plot(
  data,
  y,
  x,
  by,
  pointsize = 2,
  dodge = 0.2,
  lines = TRUE,
  width = 0.2,
  error_type = c("se", "sd", "ci"),
  percent = 0.95,
  robust = FALSE
)

Arguments

`data`	a data frame.
`y`	a numeric response variable.
`x`	a categorical explanatory variable.
`by`	a second categorical explanatory variable (optional).
`pointsize`	numeric. Point size (default = 2).
`dodge`	numeric. If a `by` variable is included, points and error bars are dodged by this amount in order to avoid overlap (default = 0.2).
`lines`	logical. If `TRUE`, group means are connected.
`width`	numeric. Width of the error bars (default = 0.2). Set to 0 to produce pointranges instead of error bars.
`error_type`	character. Error bars represents either standard deviations `(sd)`, standard errors of the means `(se)`, or confidence intervals `(ci)`. The default is the standard error.
`percent`	numeric. if `error_type = "ci"`, this indicates the size of the confidence interval. The default is `0.95` or a 95 percent confidence interval for the mean.
`robust`	logical. If `TRUE`, the means, standard deviations, standard errors, and confidence intervals are based on robust statistics. See `Details`. The default is `FALSE`.

Details

Robust statistics are based on deciles, the nine values that divide the response variable into 10 equal groups (where each group contains roughly the same fraction of cases). The robust mean is the mean of these nine decile values. The robust standard deviation is the sample standard deviation of the nine decile values. The standard error and confidence interval are calculated in the normal way, but use the robust mean and standard deviation in their calculations. See Abu-Shawiesh et al (2022).

Value

a ggplot2 graph.

References

Ahmed Abu-Shawiesh, M., Sinsomboonthong, J., & Kibria, B. (2022). A modified robust confidence interval for the population mean of distributrion baed on deciles. Statistics in Transition, vol. 23 (1). pdf

Examples

mean_plot(cars74, mpg, cyl)
mean_plot(cars74, mpg, cyl, am)
mean_plot(cars74, mpg, cyl, am, 
          error_type = "ci", percent = 0.9,
          width = 0, lines = FALSE, robust = TRUE)
mean_plot(cars74, mpg, cyl)
mean_plot(cars74, mpg, cyl, am)
mean_plot(cars74, mpg, cyl, am, 
          error_type = "ci", percent = 0.9,
          width = 0, lines = FALSE, robust = TRUE)

Normalize numeric variables

Description

Normalize the numeric variables in a data frame

Usage

normalize(data, new_min = 0, new_max = 1)
normalize(data, new_min = 0, new_max = 1)

Arguments

`data`	a data frame.
`new_min`	minimum for the transformed variables.
`new_max`	maximum for the transformed variables.

Details

normalize transforms all the numeric variables in a data frame to have the same minimum and maximum values. By default, this will be a minimum of 0 and maximum of 1. Character variables and factors are left unchanged.

Value

a data frame

Note

Use this function to be transform variables into a given range. The default is [0, 1], but [-1, 1], [0, 100], or any other range is permissible.

Examples

head(cars74)

cars74_st <- normalize(cars74)
head(cars74_st)
head(cars74)

cars74_st <- normalize(cars74)
head(cars74_st)

Get help on a package

Description

phelp provides help on an installed package.

Usage

phelp(pckg)
phelp(pckg)

Arguments

pckg

The name of a package

Details

This function provides help on an installed package. The package does not have to be loaded. The package name does not need to be entered with quotes.

Value

No return value, called for side effects.

Examples

phelp(stats)

phelp(stats)

Plot a crosstab object

Description

This function plots the results of a calculated two-way frequency table.

Usage

## S3 method for class 'crosstab'
plot(x, size = 3.5, ...)
## S3 method for class 'crosstab'
plot(x, size = 3.5, ...)

Arguments

`x`	An object of class `crosstab`
`size`	numeric. Size of bar text labels.
`...`	no currently used.

Value

a ggplot2 graph

Examples

tbl <- crosstab(cars74, cyl, gear, type = "freq")
plot(tbl)

tbl <- crosstab(cars74, cyl, gear, type = "colpercent")
plot(tbl)
tbl <- crosstab(cars74, cyl, gear, type = "freq")
plot(tbl)

tbl <- crosstab(cars74, cyl, gear, type = "colpercent")
plot(tbl)

Plot a tab object

Description

Plot a frequency or cumulative frequency table

Usage

## S3 method for class 'tab'
plot(x, fill = "deepskyblue2", size = 3.5, ...)
## S3 method for class 'tab'
plot(x, fill = "deepskyblue2", size = 3.5, ...)

Arguments

`x`	An object of class `tab`
`fill`	Fill color for bars
`size`	numeric. Size of bar text labels.
`...`	Parameters passed to a function

Value

a ggplot2 graph

Examples

tbl1 <- tab(cars74, carb)
plot(tbl1)

tbl2 <- tab(cars74, carb, sort = TRUE)
plot(tbl2)

tbl3 <- tab(cars74, carb, cum=TRUE)
plot(tbl3)
tbl1 <- tab(cars74, carb)
plot(tbl1)

tbl2 <- tab(cars74, carb, sort = TRUE)
plot(tbl2)

tbl3 <- tab(cars74, carb, cum=TRUE)
plot(tbl3)

Print a contents object

Description

print.contents prints the results of the content function.

Usage

## S3 method for class 'contents'
print(x, ...)
## S3 method for class 'contents'
print(x, ...)

Arguments

`x`	a object of class `contents`
`...`	not used.

Value

No return value, called for side effects.

Examples

testdata <- data.frame(height=c(4, 5, 3, 2, 100),
                       weight=c(39, 88, NA, 15, -2),
                       names=c("Bill","Dean", "Sam", NA, "Jane"),
                       race=c('b', 'w', 'w', 'o', 'b'))

x <- contents(testdata)
print(x)

testdata <- data.frame(height=c(4, 5, 3, 2, 100),
                       weight=c(39, 88, NA, 15, -2),
                       names=c("Bill","Dean", "Sam", NA, "Jane"),
                       race=c('b', 'w', 'w', 'o', 'b'))

x <- contents(testdata)
print(x)

Print a crosstab object

Description

This function prints the results of a calculated two-way frequency table.

Usage

## S3 method for class 'crosstab'
print(x, ...)
## S3 method for class 'crosstab'
print(x, ...)

Arguments

`x`	An object of class `crosstab`
`...`	not currently used.

Value

No return value, called for side effects

Examples

mycrosstab <- crosstab(mtcars, cyl, gear, type = "freq", digits = 2)
print(mycrosstab)

mycrosstab <- crosstab(mtcars, cyl, gear, type = "rowpercent", digits = 3)
print(mycrosstab)
mycrosstab <- crosstab(mtcars, cyl, gear, type = "freq", digits = 2)
print(mycrosstab)

mycrosstab <- crosstab(mtcars, cyl, gear, type = "rowpercent", digits = 3)
print(mycrosstab)

Print a tab object

Description

Print the results of calculating a frequency table

Usage

## S3 method for class 'tab'
print(x, ...)
## S3 method for class 'tab'
print(x, ...)

Arguments

`x`	An object of class `tab`
`...`	Parameters passed to the print function

Value

No return value, called for side effects

Examples

frequency <- tab(cardata, make, sort = TRUE, na.rm = FALSE)
print(frequency)
frequency <- tab(cardata, make, sort = TRUE, na.rm = FALSE)
print(frequency)

Summary statistics for a quantitative variable

Description

This function provides descriptive statistics for a quantitative variable alone or separately by groups. Any function that returns a single numeric value can bue used.

Usage

qstats(data, x, ..., stats = c("n", "mean", "sd"), na.rm = TRUE, digits = 2)
qstats(data, x, ..., stats = c("n", "mean", "sd"), na.rm = TRUE, digits = 2)

Arguments

`data`	data frame
`x`	numeric variable in data (unquoted)
`...`	list of grouping variables
`stats`	statistics to calculate (any function that produces a numeric value), Default: `c("n", "mean", "sd")`
`na.rm`	if `TRUE`, delete cases with missing values on x and or grouping variables, Default: `TRUE`
`digits`	number of decimal digits to print, Default: 2

Value

a data frame, where columns are grouping variables (optional) and statistics

Examples

# If no keyword arguments are provided, default values are used
qstats(mtcars, mpg, am, gear)

# You can supply as many (or no) grouping variables as needed
qstats(mtcars, mpg)

qstats(mtcars, mpg, am, cyl)

# You can specify your own functions (e.g., median,
# median absolute deviation, minimum, maximum))
qstats(mtcars, mpg, am, gear,
       stats = c("median", "mad", "min", "max"))
# If no keyword arguments are provided, default values are used
qstats(mtcars, mpg, am, gear)

# You can supply as many (or no) grouping variables as needed
qstats(mtcars, mpg)

qstats(mtcars, mpg, am, cyl)

# You can specify your own functions (e.g., median,
# median absolute deviation, minimum, maximum))
qstats(mtcars, mpg, am, gear,
       stats = c("median", "mad", "min", "max"))

R Colors

Description

Plot a grid of R colors and their associated names

Usage

rcolors(color = NULL, cex = 0.6)
rcolors(color = NULL, cex = 0.6)

Arguments

`color`	character. A text string used to search for specific color variations (see examples.)
`cex`	numeric. text size for color labels.

Details

By default rcolors plots the basic 502 distinct colors provided by the colors function. If a color name or part of a name is provided, only colors with matching names are plotted.

Value

No return value, called for side effects

References

This function is adapted from code published by Karl W. Broman.

Examples

rcolors()
rcolors("blue")
rcolors("red")
rcolors("dark")

rcolors()
rcolors("blue")
rcolors("red")
rcolors("dark")

Recode one or more variables

Description

recodes recodes the values of one or more variables in a data frame

Usage

recodes(data, vars, from, to)
recodes(data, vars, from, to)

Arguments

`data`	a data frame.
`vars`	character vector of variable names.
`from`	a vector of values or conditions (see Details).
`to`	a vector of replacement values.

Details

For each variable in the vars parameter, values are checked against the list of values in the from vector. If a value matches, it is replaced with the corresponding entry in the to vector.
Once a given observation's value matches a from value, it is recoded. That particular observation will not be recoded again by that recodes() statement (i.e., no chaining).
One or more values in the from vector can be an expression, using the dollar sign ($) to represent the variable being recoded. If the expression evaluates to TRUE, the corresponding to value is returned.
If the number of values in the to vector is less than the from vector, the values are recycled. This lets you convert several values to a single outcome value (e.g., NA).
If the to values are numeric, the resulting recoded variable will be numeric. If the variable being recoded is a factor and the to values are character values, the resulting variable will remain a factor. If the variable being recoded is a character variable and the to values are character values, the resulting variable will remain a character variable.

Value

a data frame

Note

See the vignette for detailed examples.

Examples

df <- data.frame(x = c(1, 5, 7, 3, 0),
                 y = c(9, 0, 5, 9, 2),
                 z = c(1, 1, 2, 2, 1)
                 )
df <- recodes(df, 
              vars = c("x", "y"), 
              from = 0, to = NA)
df <- recodes(df, 
              vars = "z", 
              from = c(1, 2), to = c("pass", "fail"))

df <- data.frame(x = c(1, 5, 7, 3, 0),
                 y = c(9, 0, 5, 9, 2),
                 z = c(1, 1, 2, 2, 1)
                 )
df <- recodes(df, 
              vars = c("x", "y"), 
              from = 0, to = NA)
df <- recodes(df, 
              vars = "z", 
              from = c(1, 2), to = c("pass", "fail"))

Scatterplot

Description

Create a scatter plot between two quantitative variables.

Usage

scatter(
  data,
  x,
  y,
  outlier = 3,
  alpha = 1,
  digits = 3,
  title,
  margin = "none",
  stats = TRUE,
  point_color = "deepskyblue2",
  outlier_color = "violetred1",
  line_color = "grey30",
  margin_color = "deepskyblue2"
)
scatter(
  data,
  x,
  y,
  outlier = 3,
  alpha = 1,
  digits = 3,
  title,
  margin = "none",
  stats = TRUE,
  point_color = "deepskyblue2",
  outlier_color = "violetred1",
  line_color = "grey30",
  margin_color = "deepskyblue2"
)

Arguments

`data`	data frame
`x`	quantitative predictor variable
`y`	quantitative response variable
`outlier`	number. Observations with studentized residuals larger than this value are flagged. If set to 0, observations are not flagged.
`alpha`	Transparency of data points. A numeric value between 0 (completely transparent) and 1 (completely opaque).
`digits`	Number of significant digits in displayed statistics.
`title`	Optional title.
`margin`	Marginal plots. If specified, parameter can be `histogram`, `boxplot`, `violin`, or `density`. Will add these features to the top and right margin of the graph.
`stats`	logical. If `TRUE`, the slope, correlation, and correlation squared (expressed as a percentage) for the regression line are printed on the subtitle line.
`point_color`	Color used for points.
`outlier_color`	Color used to identify outliers (see the `outlier` parameter.
`line_color`	Color for regression line.
`margin_color`	Fill color for margin boxplots, density plots, or histograms.

Details

The scatter function generates a scatterplot between two quantitative variables, along with a line of best fit and a 95% confidence interval. By default, regression statistics (b, r, r2, p) are printed and outliers (observations with studentized residuals > 3) are flagged. Optionally, variable distributions (histograms, boxplots, violin plots, density plots) can be added to the plot margins.

Value

a ggplot2 graph

Note

Variable names do not have to be quoted.

Examples

scatter(cars74, hp, mpg)
scatter(cars74, wt, hp)
p <- scatter(ggplot2::mpg, displ, hwy,
        margin="histogram",
        title="Engine Displacement vs. Highway Mileage")
plot(p)
scatter(cars74, hp, mpg)
scatter(cars74, wt, hp)
p <- scatter(ggplot2::mpg, displ, hwy,
        margin="histogram",
        title="Engine Displacement vs. Highway Mileage")
plot(p)

Skewness

Description

Calculate the skewness of a numeric variable

Usage

skewness(x, na.rm = TRUE)
skewness(x, na.rm = TRUE)

Arguments

`x`	numeric vector.
`na.rm`	if `TRUE`, delete missing values.

Value

a number

Examples

skewness(mtcars$mpg)
skewness(mtcars$mpg)

Standardize numeric variables

Description

Standardize the numeric variables in a data frame

Usage

standardize(data, mean = 0, sd = 1, include_dummy = FALSE)
standardize(data, mean = 0, sd = 1, include_dummy = FALSE)

Arguments

`data`	a data frame.
`mean`	mean of the transformed variables.
`sd`	standard deviation of the transformed variables.
`include_dummy`	logical. If `TRUE`, transform dummy coded (0,1) variables.

Details

standardize transforms all the numeric variables in a data frame to have the same mean and standard deviation. By default, this will be a mean of 0 and standard deviation of 1. Character variables and factors are left unchanged. By default, dummy coded variables are also left unchanged. Use include_dummy=TRUE to transform these variables as well.

Value

a data frame

Examples

head(cars74)

cars74_st <- standardize(cars74)
head(cars74_st)
head(cars74)

cars74_st <- standardize(cars74)
head(cars74_st)

Frequency distribution for a categorical variable

Description

Function to calculate frequency distributions for categorical variables

Usage

tab(
  data,
  x,
  sort = FALSE,
  maxcat = NULL,
  minp = NULL,
  na.rm = FALSE,
  total = FALSE,
  digits = 2,
  cum = FALSE,
  plot = FALSE
)
tab(
  data,
  x,
  sort = FALSE,
  maxcat = NULL,
  minp = NULL,
  na.rm = FALSE,
  total = FALSE,
  digits = 2,
  cum = FALSE,
  plot = FALSE
)

Arguments

`data`	A dataframe
`x`	A factor variable in the data frame.
`sort`	logical. Sort levels from high to low.
`maxcat`	Maximum number of categories to be included. Smaller categories will be combined into an "Other" category.
`minp`	Minimum proportion for a category to be included. Categories representing smaller proportions willbe combined into an "Other" category. maxcat and minp cannot both be specified.
`na.rm`	logical. Removes missing values when TRUE.
`total`	logical. Include a total category when TRUE.
`digits`	Number of digits the percents should be rounded to.
`cum`	logical. If `TRUE`, include cumulative counts and percents. In this case `total` will be set to `FALSE`.
`plot`	logical. If `TRUE`, generate bar chart rather than a frequency table.

Details

The function tab will calculate the frequency distribution for a categorical variable and output a data frame with three columns: level, n, percent.

Value

If plot = TRUE return a ggplot2 bar chart. Otherwise return a data frame.

Examples

tab(cars74, carb)
tab(cars74, carb, plot=TRUE)
tab(cars74, carb, sort=TRUE)
tab(cars74, carb, sort=TRUE, plot=TRUE)
tab(cars74, carb, cum=TRUE)
tab(cars74, carb, cum=TRUE, plot=TRUE)
tab(cars74, carb)
tab(cars74, carb, plot=TRUE)
tab(cars74, carb, sort=TRUE)
tab(cars74, carb, sort=TRUE, plot=TRUE)
tab(cars74, carb, cum=TRUE)
tab(cars74, carb, cum=TRUE, plot=TRUE)

Time spent watching television - 2017

Description

This is a data set detailing TV usage on days surveyed as determined by the 2017 American Time Use Survey. The data set includes demographic information, as well as details regarding employment and family makeup, where applicable. Information on days surveyed, as well as whether the day is a holiday, is also included.

Usage

tv
tv

Format

A data frame with 10,223 rows and 21 variables. The variables are as follows:

id: ID of respondent
weight: ATUS final weight
youngest_child: Age of the youngest child in the household that is less than 18 years old (if applicable). Range: 1-17; if no child in household: NA
age: Age of respondent
sex: Sex of respondent
job: Status of employment of the respondent. Direct transcription from original codebook: 1 = Employed, at work, 2 = Employed, absent, 3 = Unemployed, on layoff, 4 = Unemployed, looking, 5 = Not in the labor force.
m_job: The response to question, “in the last seven days did you have more than one job?” Returns NA if no job.
f_job: Does the respondent have a full time job or a part time job? (NA if no job)
educ: Are you enrolled in high school, college, or university? (NA if not currently enrolled)
educ2: If yes to educ, are you enrolled in high school or upper schooling? (NA if not currently enrolled)
partner: Presence of the respondent's spouse or unmarried partner in the household with 1 = Spouse present 2 = Unmarried partner present 3 = No spouse/unmarried partner present
pr_job: Answer to the question, “does your partner have a job?” (NA if not applicable)
salary: Weekly earnings at the respondent’s main job, two decimals implied
children: Number of children under 18 in the household
pr_job_f: Part time/full time job status of partner, if applicable (NA if partner unemployed or no partner)
job_hours: Total hours usually worked per week (-4: Hours vary)
day: Day of the week about which the respondent was interviewed (Monday thorugh Friday)
holiday: Notes if the respondent was interviewed on a holiday
elder_care: Total time spent providing elder care that day by the respondent, in minutes
child_time: Total time spent during diary day providing secondary childcare for household children younger than 13, in minutes
tv: Minutes spent watching TV

Details

For more information regarding the key visit https://www.bls.gov/tus/atusintcodebk17.pdf. This data is retrieved from the American Time Use Survey, made available through the Bureau of Labor Statistics https://www.bls.gov/tus/datafiles_2017.htm.

Examples

summary(tv)

hist(tv$tv, col="skyblue")
summary(tv)

hist(tv$tv, col="skyblue")

Univariate plot

Description

Generates a descriptive graph for a quantitative variable.

Usage

univariate_plot(
  data,
  x,
  bins = 30,
  fill = "deepskyblue",
  pointcolor = "black",
  density = TRUE,
  densitycolor = "grey",
  alpha = 0.2,
  seed = 1234
)
univariate_plot(
  data,
  x,
  bins = 30,
  fill = "deepskyblue",
  pointcolor = "black",
  density = TRUE,
  densitycolor = "grey",
  alpha = 0.2,
  seed = 1234
)

Arguments

`data`	a data frame.
`x`	a variable name (without quotes).
`bins`	number of histogram bins.
`fill`	fill color for the histogram and boxplot.
`pointcolor`	point color for the jitter plot.
`density`	logical. Plot a filled density curve over the the histogram. (default=TRUE)
`densitycolor`	fill color for density curve.
`alpha`	Alpha transparency (0-1) for the density curve and jittered points.
`seed`	pseudorandom number seed for jittered plot.

Details

univariate_plot generates a plot containing three graphs: a histogram (with an optional density curve), a horizontal jittered point plot, and a horizontal box plot. The subtitle contains descriptive statistics, including the mean, standard deviation, median, minimum, maximum, and skew.

Value

a ggplot2 graph

Note

The graphs are created with ggplot2 and then assembled into a single plot through the patchwork package. Missing values are deleted.

Examples

univariate_plot(mtcars, mpg)
univariate_plot(cardata, city_mpg, fill="lightsteelblue",
                pointcolor="lightsteelblue", densitycolor="lightpink",
                alpha=.6)
univariate_plot(mtcars, mpg)
univariate_plot(cardata, city_mpg, fill="lightsteelblue",
                pointcolor="lightsteelblue", densitycolor="lightpink",
                alpha=.6)

Package 'qacBase'

Help Index

Barcharts

Description

Usage

Arguments

Value

Examples

Automobile characteristics

Description

Usage

Format

Details

Source

Examples

Motor Trend car road tests

Description

Usage

Format

Details

Source

Examples

Detailed description of a data frame

Description

Usage

Arguments

Details

Value

Examples

Correlation matrix plot

Description

Usage

Arguments

Details

Value

Note

Examples

Two-way frequency table

Description

Usage

Arguments

Details

Value

See Also

Examples

Density plots

Description

Usage

Arguments

Details

Value

Examples

Visualize a data frame

Description

Usage

Arguments

Details

Value

See Also

Examples

Test of group differences

Description

Usage

Arguments

Details

Value

See Also

Examples

Histograms

Description

Usage

Arguments

Details

Value

Examples

List object sizes and types

Description

Usage

Arguments

Details