1 Introduction

Good Software Engineering Practice for R Packages

Shuang Li

July 31, 2025

Disclaimer




Any opinions expressed in this presentation and on the following slides are solely those of the presenter and not necessarily those of Roche.

Joe

  • Ph.D. in Statistics
  • Postdoc at the University of Oxford for 6 years, Data Scientist at Roche for the last 4 years, technical engineering lead for the NEST SME team, technical lead for auto-translation and slide automation initiatives at Roche.
  • Multiple open-source packages on Github and CRAN, see this page for details.
  • Feel free to connect at LinkedIn or Github

Shuang

  • Master in Pharmacology from Fudan University
  • Senior Clinical Data Scientist in Roche for 7 years, lead multiple Data Mart Projects for secondary use of clinical data
  • Lead and developed multiple internal R packages and Shiny Apps for data review and data curation
  • Feel free to connect at Github

Zhenglin

  • Bachelor in Biotechnology
  • Data engineer in Roche for 3 years, developed R packages and Shiny applications internally
  • Feel free to connect at Github

Chunyan

  • Master degree in Epidemiology and Medical Statistics from Tongji University
  • Technical lead for multiple R projects at Roche, actively contributing to the development of automated tools and pipelines while focusing on enhancing efficiency
  • Feel free to connect at Github

What you will learn today

  • Understand the basic structure of an R package
  • Create your own R
  • Learn about & apply professional development workflow
  • Learn & apply fundamentals of quality control for R
  • Get crash-course in version control to stay organized
  • Try out modern collaboration techniques on GitHub.com

Program outline

9:30 - 9:40 Introduction and outline
9:40 - 10:10 R Package Syntax
10:10 - 10:50 Software Engineering Workflow
10:50 - 11:10 Tea Break
11:10 - 11:40 Package Quality
11:40 - 12:10 Collaboration via GitHub
12:10 - 12:30 Summary and Discussion

House-keeping

What you will need

  • Github.com (free) account
  • Recommended: posit.cloud
    • Free tier sufficient
    • Comes with everything installed
    • Alternative: local R development environment with
      • git
      • Rtools/R/Rstudio IDE
  • Curiosity ๐Ÿฆ
  • Positive attitude ๐Ÿ˜„

What do we mean by GSWEP4R*?

  • Applying concept of GxP to SWE with R
  • Improve quality of R code/packages, particularly in regulated enviroments but not limited to!
  • Not a fixed term, we share our perspectives
  • Collection of best practices
  • Do not reinvent the wheel: learn from IT/open source space

Why care about GSWEP4R?

  • Move to / integration of R in pharma is clear trend
  • R is a powerful yet complex ecosystem
    • Core component: R packages
    • Mature analysts: users & contributors
    • Deep understanding crucial, even to just assess quality
  • Analyses increasingly require complex scripts/programs
    \(\leadsto\) line between programming and data analysis blurs
  • Value: de-risking use of R and efficiency gains

Start small - from script to package

  1. Encapsulate behavior (functions)
  2. Avoid global state/variables
  3. Adopt consistent coding style
  4. Document well
  5. Add test cases
  6. Version your code
  7. Share as โ€˜bundleโ€™

\(\leadsto\) R package

The R package ecosystem - huge success

GxP + R =

  • Core infrastructure packages only through industry
  • Quality, burden sharing: open-source pharmaverse and others
  • Open methodological packages can de-risk innovative methods
  • R packages make (statistical/methodological) code
    • testable (with documented evidence thereof, CRF 11)
    • reusable
    • shareable
    • easier to document

Question, Comments?

License information