2 R packages

Good Software Engineering Practice for R Packages

Liming

August 1, 2024

Introduction

What you know already

  • Packages provide a mechanism for loading optional code, data and documentation as needed.
  • A library is a directory into which packages are installed.
  • install.packages() and R CMD INSTALL is used to install packages into the library.
  • library() is used to load and attach packages from the library.
    • attach means that the package is put in your search list (objects in the package can be used directly).
  • Remember that package \(\neq\) library!

What we want to talk about now

  • How to write, build, test, and check your own package 😊
  • How to do this in a good and sustainable way.
  • Give tips and tricks based on practical experience.

Contents of a package

How is a package structured?

Package source = directory with files and subdirectories.

  • Mandatory:
    • DESCRIPTION
    • NAMESPACE
    • R
    • man
  • Typically also:
    • data
    • inst
    • src
    • tests
    • vignettes
    • NEWS

How to get started quickly

Once upon a time, developers would set up this structure manually. 🥱

Nowadays, it is super fast with:

  • usethis::create_package()
  • RStudio > File > New Project > New Directory > R Package

DESCRIPTION File

  • Package: Choose the name of your package.
    • Not unimportant!
    • Check CRAN whether your name is still available.
  • Title: Add a Title for Your Package. (Title Case)
  • Version: Start with a low package version.
    • Major.Minor.Patch syntax
  • Authors@R: Add authors and maintainer.
  • Description: Like an abstract, including references.

DESCRIPTION File (cont’d)

  • License: Important for open sourcing.
    • Consider permissive licenses such as Apache and MIT.
  • Depends:
    • Which R version users need to have at least.
    • Ideally don’t put any package here.
    • Packages will be loaded and attached upon library your package.
  • Imports: Packages which you import functions, methods, classes from.
  • Suggests: Packages for documentation processing (roxygen2), running examples, tests (testthat), vignettes.

R Folder

  • Only contains R code files (recommended to use .R suffix)
    • Can create a file with usethis::use_r("filename")
  • Assigns R objects, i.e. mostly functions, but could also be constant variables, data sets, etc.
  • Should not have any side effects, i.e. avoid require(), options() etc.
  • If certain code needs to be sourced first, use on top of file (which will update the Collate field of DESCRIPTION automatically).
#' @include dependency.R
NULL

NAMESPACE File

  • Defines the namespace of the package, to work with R’s namespace management system
  • Namespace directives in this file allow to specify:
    • which objects are exported to users and other packages
    • which are imported from other packages

NAMESPACE File (cont’d)

  • Controls the search strategy for variables:
    1. Local (in the function body etc.)
    2. Package namespace
    3. Imports
    4. Base namespace
    5. Normal search() path

man Folder

  • Contains documentation files for the objects in the package in the .Rd format
    • The syntax is a bit similar like LaTeX
  • All user level objects should be documented
  • Internal objects don’t need to be documented, but may be (and I would recommend it)
  • Once upon a time, developers would set up these .Rd files and the NAMESPACE manually. 🥱
  • Fortunately, nowadays we have roxygen2! 🚀

roxygen2 to the Rescue!

  • We can include the documentation source directly in the R script, on top of the objects we are documenting
  • Syntax is composed of special comments #' and special macros preceded with @
  • In RStudio running Build > More > Document will render the .Rd files and the NAMESPACE file for you
  • Get started with usethis::use_roxygen_md()
  • Inside a function, click Code > Insert Roxygen Skeleton

Setting up roxygen2 in your project

roxygen2 Source

R/my_sum.R:

#' My Summation Function
#'
#' This is my first function and it sums two numbers.
#'
#' @param x first summand.
#' @param y second summand.
#'
#' @return The sum of `x` and `y`.
#' @export
#' 
#' @note This function is a bit boring but that is ok.
#' @seealso [Arithmetic] for an easier way.
#'
#' @examples
#' my_sum(1, 2)
my_sum <- function(x, y) {
  x + y
}

roxygen2 Output

man/my_sum.Rd:

% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/bla.R
\name{my_sum}
\alias{my_sum}
\title{My Summation Function}
\usage{
my_sum(x, y)
}
\arguments{
\item{x}{first summand.}

\item{y}{second summand.}
}
\value{
The sum of \code{x} and \code{y}.
}
\description{
This is my first function and it sums two numbers.
}
\note{
This function is a bit boring but that is ok.
}
\examples{
my_sum(1, 2)
}
\seealso{
\link{Arithmetic} for an easier way.
}

roxygen2 Output (cont’d)

NAMESPACE:

# Generated by roxygen2: do not edit by hand

export(my_sum)

tests Folder

  • Here we store the unit tests covering the functionality of the package
  • Get started with usethis::use_testthat() and usethis::use_test() and populate tests/testthat folder with unit tests
  • Rarely, tests cannot be run within testthat framework, then these can go into R scripts directly in tests directory
  • We will look at unit tests in detail later

data Folder

  • For (example) data that you ship in your package to the user
    • Get started with usethis::use_data()
    • Note: Usually we use lazy data loading, therefore no data() call needed before using the data
  • If you generate the example data, save the R script, too
    • Put that into data-raw folder, start with usethis::use_data_raw()

inst Folder

  • Contents will be copied recursively to installation directory
    • Be careful not to interfere with standard folder names
  • For data that is used by functions in the package itself
    • Those would typically go into inst/extdata folder
    • Load with system.file("path/file", package = "mypackage")
  • CITATION: For custom citation() output
    • Create it with usethis::use_citation()
  • inst/doc can contain documentation files (typically pdf)

src Folder

  • Contains sources and headers for any code that needs compilation
  • Should only contain a single language here
    • Because R uses it, mixing C, C++ and Fortran usually works with OS native compilers
  • Much more complex to write and maintain than an R only package
  • Typically only makes sense for
    • wrapping existing libraries for use in R
    • speeding up complex computations - starting point: Rcpp::Rcpp.package.skeleton()

vignettes Folder

  • Special case of documentation files (pdf or html) created by compiling source files
  • Package users don’t need to recompile the vignettes - they are delivered with the package
  • Start a new vignette with usethis::use_vignette()
    • Starts an Rmd vignette, compiled with knitr
  • Important for the user to understand the high-level ideas
  • Complements function-level documentation from our roxygen2 chunks

NEWS File

  • Lists user-visible changes worth mentioning
  • In each new release, add items at the top under the version they refer to
  • Don’t discard old items: leave them in the file after the newer items
  • Start one with usethis::use_news_md()

Building the package

Documenting the package

  • The first step is to produce the documentation files and NAMESPACE
  • In RStudio: Build > More > Document

Checking the package

  • R comes with pre-defined check command for packages: “the R package checker” aka R CMD check
  • About 22 checks are run (so quite a lot), including things like:
    • package can be installed
    • is the code syntax ok
    • is the documentation complete
    • tests are run
    • examples are being run
  • In RStudio: Build > Check

Building the package

  • The R package folder can be compressed into a single package file
  • Typically we manually only build “source” package
    • In RStudio: Build > More > Build Source Package
    • Makes it easy to share the package with others and submit to CRAN

Installing the package

  • R comes with pre-defined install command for packages: R CMD INSTALL
  • In RStudio: Build > Install
  • Note: During development it is usually sufficient to use Build > More > Load All
    • Runs devtools::load_all()
    • Roughly simulates what happens when package would be installed and loaded
    • Unexported objects and helpers under tests will also be available
    • Key: much faster!

Exercise

Let’s try this out now 😊

  1. Set up a new R package with a fancy name
  2. Fill out the DESCRIPTION file
  3. Include a new function
  4. Add roxygen documentation
  5. Export the function to the namespace
  6. Produce the package documentation
  7. Run checks
  8. Build the package

References

License information