Cost distribution among software process activities
Good Software Engineering Practice for R Packages
July 31, 2025
From an idea to a production-grade R package
Example scenario: in your daily work, you notice that you need certain one-off scripts again and again.
The idea of creating an R package was born because you understood that “copy and paste” R scripts is inefficient and on top of that, you want to share your helpful R functions with colleagues and the world…
Photo CC0 by ELEVATE on pexels.com
Photo CC0 by Chevanon Photography on pexels.com
Bad practice!
Why?
Cost distribution among software process activities
Origin of errors in system development
Boehm, B. (1981). Software Engineering Economics. Prentice Hall.
Invest time in
… but in many cases the workflow must be workable for a single developer or a small team.
Photo CC0 by Kateryna Babaieva on pexels.com
Let’s assume that you used some lines of code to create simulated data in multiple projects:
Idea: put the code into a package
Obligation level | Key word1 | Description |
---|---|---|
Duty | shall | “must have” |
Desire | should | “nice to have” |
Intention | will | “optional” |
Purpose and Scope
The R package simulatr shall enable the creation of reproducible fake data.
Package Requirements
simulatr shall provide a function to generate normal distributed random data for two independent groups. The function shall allow flexible definition of sample size per group, mean per group, standard deviation per group. The reproducibility of the simulated data shall be ensured via an optional seed It should be possible to print the function result. A graphical presentation of the simulated data will also be possible.
Useful formats / tools for design docs:
UML Diagram
R package programming
One-off script as starting point:
Refactored script:
Almost all functions, arguments, and objects should be self-explanatory due to their names.
Define that the result is a list1 which is defined as class2:
getSimulatedTwoArmMeans <- function(n1, n2, mean1, mean2, sd1, sd2) {
result <- list(n1 = n1, n2 = n2,
mean1 = mean1, mean2 = mean2, sd1 = sd1, sd2 = sd2)
result$data <- data.frame(
group = c(rep(1, n1), rep(2, n2)),
values = c(
rnorm(n = n1, mean = mean1, sd = sd1),
rnorm(n = n2, mean = mean2, sd = sd2)
)
)
# set the class attribute
result <- structure(result, class = "SimulationResult")
return(result)
}
The output is impractical, e.g., we need to scroll down:
$n1
[1] 50
$n2
[1] 50
$mean1
[1] 5
$mean2
[1] 7
$sd1
[1] 3
$sd2
[1] 4
$data
group values
1 1 7.9913524
2 1 2.4433099
3 1 4.8248363
4 1 4.3003821
5 1 5.1070738
6 1 5.6821700
7 1 6.9767241
8 1 3.5878576
9 1 7.3006997
10 1 7.3315869
11 1 0.3552069
12 1 5.7443710
13 1 4.5599642
14 1 6.6151279
15 1 7.8710057
16 1 -0.6704843
17 1 5.0435107
18 1 4.3993856
19 1 1.3970932
20 1 7.1525637
21 1 8.0186925
22 1 5.6764564
23 1 3.3590526
24 1 5.8097090
25 1 6.0043209
26 1 2.9758671
27 1 9.1309671
28 1 6.1951981
29 1 9.5007511
30 1 5.3578726
31 1 3.5172697
32 1 5.6763330
33 1 8.7859767
34 1 5.1906407
35 1 6.6614119
36 1 5.5707242
37 1 2.1098313
38 1 3.4000577
39 1 5.8239890
40 1 6.6442456
41 1 7.0296165
42 1 9.8081476
43 1 6.5420327
44 1 4.0359417
45 1 8.1592040
46 1 4.2568957
47 1 5.9482276
48 1 9.3280873
49 1 2.8239449
50 1 8.4520435
51 2 6.2519630
52 2 10.6679844
53 2 6.5189243
54 2 4.3832382
55 2 9.5586722
56 2 6.1415727
57 2 6.4627028
58 2 5.2091507
59 2 9.2602250
60 2 7.4173883
61 2 8.1497441
62 2 13.4838278
63 2 8.3845988
64 2 1.9356994
65 2 6.1407534
66 2 8.0005619
67 2 11.7882685
68 2 5.2035962
69 2 9.3936924
70 2 5.0722108
71 2 6.7987329
72 2 11.1883661
73 2 7.1990932
74 2 10.0564793
75 2 14.0917182
76 2 0.2258156
77 2 1.9302200
78 2 1.7673350
79 2 8.1984231
80 2 6.6291737
81 2 7.4035736
82 2 14.3373141
83 2 9.9027858
84 2 5.4479080
85 2 7.1740732
86 2 7.9115043
87 2 1.6422530
88 2 2.8073269
89 2 3.6387938
90 2 12.3151242
91 2 9.9603405
92 2 7.7684747
93 2 2.6857278
94 2 13.6305369
95 2 14.2229172
96 2 7.9807683
97 2 9.9216758
98 2 8.0806395
99 2 4.3491484
100 2 10.4791571
attr(,"class")
[1] "SimulationResult"
Solution: implement generic function print
Generic function print
:
#' @title
#' Print Simulation Result
#'
#' @description
#' Generic function to print a `SimulationResult` object.
#'
#' @param x a \code{SimulationResult} object to print.
#' @param ... further arguments passed to or from other methods.
#'
#' @examples
#' x <- getSimulatedTwoArmMeans(n1 = 50, n2 = 50, mean1 = 5,
#' mean2 = 7, sd1 = 3, sd2 = 4, seed = 123)
#' print(x)
#'
#' @export
$args
n1 n2 mean1 mean2 sd1 sd2
"50" "50" "5" "7" "3" "4"
$data
# A tibble: 100 × 2
group values
<dbl> <dbl>
1 1 7.99
2 1 2.44
3 1 4.82
4 1 4.30
5 1 5.11
6 1 5.68
7 1 6.98
8 1 3.59
9 1 7.30
10 1 7.33
# ℹ 90 more rows
pkgdown
pkgdown
pkgdown
makes it quick and easy to build a website for your packagepkgdown
, just use usethis::use_pkgdown()
to get started_pkgdown.yml
filereference
section updated with names of .Rd
files_pkgdown.yml
file---
url: https://openpharma.github.io/mmrm
template:
bootstrap: 5
params:
ganalytics: UA-125641273-1
navbar:
right:
- icon: fa-github
href: https://github.com/openpharma/mmrm
reference:
- title: Package
contents:
- mmrm-package
- title: Functions
contents:
- mmrm
- fit_mmrm
- mmrm_control
- fit_single_optimizer
- refit_multiple_optimizers
- df_1d
- df_md
- component
gh-pages
that stores the rendered websitemain
branch is updatedusethis::use_pkgdown_github_pages()
pkgdown::deploy_to_branch()
Photo CC0 by Pixabay on pexels.com
Add assertions to improve the usability and user experience
Tip on assertions
Use the package checkmate to validate input arguments.
Example:
Error in playWithAssertions(-1) : Assertion on ‘n1’ failed: Element 1 is not >= 1.
Add three additional results:
Tip on creation time
Sys.time()
, format(Sys.time(), '%B %d, %Y')
, Sys.Date()
Add an additional result: t.test
result
Add an optional alternative argument and pass it through t.test
:
Implement the generic functions print
and plot
.
Tip on print
Use the plot example function from above and extend it.
Optional extra tasks:
Implement the generic functions summary
and cat
Implement the function kable
known from the package knitr as generic. Tip: use
to define kable as generic
Optional extra task1:
Document your functions with Roxygen2