1 Introduction

Good Software Engineering Practice for R Packages

Liming

August 1, 2024

Disclaimer




Any opinions expressed in this presentation and on the following slides are solely those of the presenter and not necessarily those of Roche.

Liming

  • Master in Biostatistics from Fudan University
  • Statistical Programmer in Roche for 5 years, technical engineering lead for chevron team in NEST project
  • Member of ASA BIOP working group on Software Engineering openstatsware
  • Author of multiple open source R packages including mmrm, sasr and RobinCar2
  • Feel free to connect at Github

Joe

  • Ph.D. in Statistics
  • Postdoc at the University of Oxford for 6 years, Data Scientist at Roche for the last 4 years, technical engineering lead for the NEST SME team, technical lead for auto-translation and slide automation initiatives at Roche.
  • Multiple open-source packages on Github and CRAN, see this page for details.
  • Feel free to connect at LinkedIn or Github

What you will learn today

  • Understand the basic structure of an R package
  • Create your own R
  • Learn about & apply professional development workflow
  • Learn & apply fundamentals of quality control for R
  • Get crash-course in version control to stay organized
  • Try out modern collaboration techniques on GitHub.com

Program outline

13:30 - 13:45 Introduction and outline
13:45 - 14:15 R Package Syntax
14:15 - 15:00 Software Engineering Workflow
15:00 - 15:15 Tea Break
15:15 - 16:00 Package Quality
16:00 - 17:00 Collaboration via GitHub
17:00 - 17:30 Summary and Discussion

House-keeping

What you will need

  • Github.com (free) account
  • Recommended: posit.cloud
    • Free tier sufficient
    • Comes with everything installed
    • Alternative: local R development environment with
      • git
      • Rtools/R/Rstudio IDE
  • Curiosity ๐Ÿฆ
  • Positive attitude ๐Ÿ˜„

What do we mean by GSWEP4R*?

  • Applying concept of GxP to SWE with R
  • Improve quality of R code/packages, particularly in regulated enviroments but not limited to!
  • Not a fixed term, we share our perspectives
  • Collection of best practices
  • Do not reinvent the wheel: learn from IT/open source space

Why care about GSWEP4R?

  • Move to / integration of R in pharma is clear trend
  • R is a powerful yet complex ecosystem
    • Core component: R packages
    • Mature analysts: users & contributors
    • Deep understanding crucial, even to just assess quality
  • Analyses increasingly require complex scripts/programs
    \(\leadsto\) line between programming and data analysis blurs
  • Value: de-risking use of R and efficiency gains

Start small - from script to package

  1. Encapsulate behavior (functions)
  2. Avoid global state/variables
  3. Adopt consistent coding style
  4. Document well
  5. Add test cases
  6. Version your code
  7. Share as โ€˜bundleโ€™

\(\leadsto\) R package

The R package ecosystem - huge success

GxP + R =

  • Core infrastructure packages only through industry
  • Quality, burden sharing: open-source pharmaverse and others
  • Open methodological packages can de-risk innovative methods
  • R packages make (statistical/methodological) code
    • testable (with documented evidence thereof, CRF 11)
    • reusable
    • shareable
    • easier to document

Question, Comments?

License information