Practical #6
Advanced Statistical Programming using R — R Packages
Quiz
Before starting, work through this QUIZ to check your understanding of the concepts covered in this week’s lecture on R packages. Find a full solution R file HERE.
Overview
This is an online, drop-in consultation practical session. Lisa will be available between 2-4pm on this zoom link. Please wait in the waiting room until you’re invited in to ask your questions.
In this practical you will build a complete R data package from scratch: munichvisitors. By the end you will have a package that ships a cleaned dataset, a documented plot function, and — if you tackle the extension — a vignette.
The practical is structured into five parts:
- Create the package skeleton — scaffold, DESCRIPTION, README
- Add data — download, clean, and export the museums dataset
- Add a function — write, document, and test
plot_museums() - Add functions for a second dataset — extend the package with monthly statistics from a separate open data source
- Extension: vignette — write a how-to guide for your package
- Reflection log — record what you learned this week
At the end you will also find a keyboard shortcuts reference for the package development loop, and notes on using LLMs to scaffold and document packages.
Solutions are available for parts 1–3. Parts 4, 5, and 6 are left for you to complete. Solution suggestions for Part 4 will be available next week.
Part 1: Create the Package Skeleton
Exercise 1.1: Initialise the package
Use usethis::create_package() to scaffold a new package called munichvisitors in a location of your choice. This will open a new RStudio project.
Run in the R console:
usethis::create_package("munichvisitors")RStudio should open the new project. The folder structure will look like:
munichvisitors/
├── DESCRIPTION
├── NAMESPACE
├── R/
├── munichvisitors.Rproj
├── .gitignore
└── .Rbuildignore
Exercise 1.2: Edit the DESCRIPTION file
Open DESCRIPTION and fill in the fields below. Use your own name in Authors@R.
Package: munichvisitors
Title: Monthly Visitor Counts for Munich Museums
Version: 0.0.0.9000
Authors@R:
person("First", "Last", , "you@example.com", role = c("aut", "cre"))
Description: Provides tidy monthly visitor statistics for Munich's museums
from Munich Open Data (Statistisches Amt München), with a helper
plot function for exploring trends over time.
License: MIT + file LICENSE
Encoding: UTF-8
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.3.2
LazyData: true
Imports:
dplyr,
ggplot2,
scales
Suggests:
janitor,
readr,
testthat (>= 3.0.0),
knitr,
rmarkdown
Config/testthat/edition: 3
VignetteBuilder: knitr
RoxygenNote should match your installed version of roxygen2. You can check with packageVersion("roxygen2") in the console. devtools::document() will update it automatically, so don’t worry if it changes.
Add the MIT licence by running in the R console (inside the new munichvisitors project):
usethis::use_mit_license()Exercise 1.3: Write a README
Create a README template by running in the R console:
usethis::use_readme_rmd()Open README.Rmd and write a short README following this structure (from R Packages (2e)):
- One paragraph describing what the package does
- Installation instructions:
devtools::install_github("your-username/munichvisitors") - A brief overview of what is included
- A short example showing how to use it — come back and fill this in after Exercise 3.1, once you have a working function to show
Knit README.Rmd to produce README.md — GitHub renders the .md version on your repository page. Re-knit whenever you update it.
---
output: github_document
---
# munichvisitors
The `munichvisitors` package provides tidy monthly visitor counts for
Munich's public museums, sourced from Munich Open Data (Statistisches
Amt München). It includes a ready-to-use plot function so you can
explore trends with a single line of code.
## Installation
``` r
# install.packages("devtools")
devtools::install_github("your-username/munichvisitors")
```
## What's included
- `museum_visitors` — a data frame of monthly visitor counts per museum,
with year-on-year comparison figures
- `plot_museums()` — a line chart of visitor trends over time
## Example
``` r
library(munichvisitors)
plot_museums()
```
## Data source
Landeshauptstadt München (2017). Monatszahlen Museen. Statistisches Amt München.
Lizenz: [Datenlizenz Deutschland Namensnennung 2.0](https://www.govdata.de/dl-de/by-2-0).
<https://datengartln.de/datasets/detail/bfb4a286-bea5-4bfe-82ce-b9bd354284a5/>Part 2: Add Data
The dataset is monthly visitor counts for Munich’s museums from Munich Open Data. You can download the CSV directly from:
👉 https://datengartln.de/datasets/detail/bfb4a286-bea5-4bfe-82ce-b9bd354284a5/
Exercise 2.1: Set up the data-raw folder
Run in the R console:
usethis::use_data_raw("museum-visitors")This creates data-raw/museum-visitors.R and opens it automatically. Write your download-and-clean script in that file.
Exercise 2.2: Write the data-raw script
Fill in data-raw/museum-visitors.R to:
- Download the CSV from the URL below
- Clean the column names with
janitor::clean_names() - Save the cleaned object with
usethis::use_data()
The direct CSV download URL is:
url <- paste0(
"https://opendata.muenchen.de/dataset/bfb4a286-bea5-4bfe-82ce-b9bd354284a5/",
"resource/6c6a809e-91ee-4f3e-9268-a8b7bc38311c/download/",
"monatszahlen2603_museen_16_03_26.csv"
)You will need readr and janitor to write the data-raw script. Since they are only used to build the data (not in any package function), declare them as suggested packages rather than imports:
usethis::use_package("readr", type = "Suggests")
usethis::use_package("janitor", type = "Suggests")Write the following in data-raw/museum-visitors.R (an R script, not the console):
## code to prepare `museum_visitors` dataset goes here
url <- paste0(
"https://opendata.muenchen.de/dataset/bfb4a286-bea5-4bfe-82ce-b9bd354284a5/",
"resource/6c6a809e-91ee-4f3e-9268-a8b7bc38311c/download/",
"monatszahlen2603_museen_16_03_26.csv"
)
museum_visitors_raw <- readr::read_csv(url)
museum_visitors <- museum_visitors_raw |>
janitor::clean_names() # WERT → wert, AUSPRAEGUNG → auspraegung, …
usethis::use_data(museum_visitors, overwrite = TRUE)Run the entire script (not just the last line) to download and save the data. Use Ctrl+Shift+Enter to run all, or click Source in the top right of the RStudio editor pane.
Exercise 2.3: Check the data loaded correctly
Reload the package and inspect the dataset in the R console:
devtools::load_all() # Ctrl+Shift+L
museum_visitors
dplyr::glimpse(museum_visitors)What columns does the cleaned dataset have? Note down the column names — you will need them for the next exercise.
Exercise 2.4: Document the dataset
Data objects need their own documentation. By convention, create R/data.R by running in the R console:
usethis::use_r("data")Write a roxygen2 documentation block for museum_visitors. Include:
- A title and description
@formatdescribing the data frame and each column@sourcewith the open data URL- The dataset name as a string at the end (not a function call)
Write the following in R/data.R (an R script):
#' Monthly visitor counts for Munich museums
#'
#' Monthly visitor statistics for Munich's public museums, sourced from
#' Munich Open Data (Statistisches Amt München). Data covers all major
#' municipal museums with year-on-year comparison figures.
#'
#' @format A data frame with one row per museum per month:
#' \describe{
#' \item{monatszahl}{Category label (always "Besucher*innen")}
#' \item{auspraegung}{Museum name}
#' \item{jahr}{Year}
#' \item{monat}{Year-month code (YYYYMM format)}
#' \item{wert}{Visitor count for that month}
#' \item{vorjahreswert}{Visitor count in the same month of the prior year}
#' \item{veraend_vormonat_prozent}{Percentage change vs. previous month}
#' \item{veraend_vorjahresmonat_prozent}{Percentage change vs. same month prior year}
#' \item{zwoelf_monate_mittelwert}{12-month rolling average}
#' }
#' @source Landeshauptstadt München (2017). Monatszahlen Museen.
#' Statistisches Amt München. Lizenz: Datenlizenz Deutschland
#' Namensnennung 2.0 (dl-by-de).
#' <https://datengartln.de/datasets/detail/bfb4a286-bea5-4bfe-82ce-b9bd354284a5/>
"museum_visitors"Then run in the R console:
devtools::document() # Ctrl+Shift+D
?museum_visitors # check the rendered help pagePart 3: Add a Function
Exercise 3.1: Create the plot function
Create a new R file for the function by running in the R console:
usethis::use_r("plot_museums")Write a function plot_museums() that produces a line chart of monthly visitor counts, with one line per museum (auspraegung), using the museum_visitors dataset that is bundled with the package.
Use ggplot2:: prefixes throughout — never library(ggplot2) inside a package. Declare all dependencies in the R console:
usethis::use_package("ggplot2")
usethis::use_package("dplyr")
usethis::use_package("scales")Write the following in R/plot_museums.R (an R script):
#' Annual visitor counts per Munich museum
#'
#' Plots annual visitor totals for each museum in the bundled
#' `museum_visitors` dataset, using the yearly summary rows
#' (where `monat == "Summe"`).
#'
#' @return A `ggplot2` plot object.
#' @export
#' @examples
#' plot_museums()
plot_museums <- function() {
museum_visitors |>
dplyr::filter(
monat == "Summe"
) |>
ggplot2::ggplot(ggplot2::aes(x = jahr, y = wert, colour = auspraegung)) +
ggplot2::geom_line() +
ggplot2::labs(x = "Year", y = "Visitors", colour = "Museum") +
ggplot2::ggtitle("Annual Visitors to Museums in Munich") +
ggplot2::scale_y_continuous(labels = scales::label_number())
}You will also need to declare dplyr and scales as dependencies. Run in the R console:
usethis::use_package("dplyr")
usethis::use_package("scales")And add the variable names used in the function to R/utils.R:
utils::globalVariables(c("museum_visitors", "monat", "wert", "auspraegung", "jahr"))Exercise 3.2: Document and export
Add a roxygen2 block above plot_museums() (see solution above). Then run in the R console to generate the help files and check the function is accessible:
devtools::document() # Ctrl+Shift+D
devtools::load_all() # Ctrl+Shift+L
plot_museums()
?plot_museumsFor a function to be accessible after installing the package, it must have @export in its roxygen2 block. Without it, devtools::document() will not add it to NAMESPACE and users will not be able to call it.
Exercise 3.3: Run devtools::check()
Run in the R console:
devtools::check() # Ctrl+Shift+ERead through any WARNINGs or NOTEs. A clean check shows 0 errors, 0 warnings, 0 notes. Common issues at this stage:
- Missing
@exporttag - Functions called without the
package::prefix - Undeclared imports in
DESCRIPTION
If you see no visible binding for global variable 'monat' (or similar), create R/utils.R (run usethis::use_r("utils") in the R console) and add:
utils::globalVariables(c("museum_visitors", "monat", "wert", "auspraegung", "jahr"))This tells R CMD check that these names are intentionally used as unquoted column names (tidy evaluation).
Exercise 3.4: Add a test
Set up testing and create a test file by running in the R console:
usethis::use_testthat()
usethis::use_test("plot_museums")Write at least two expect_* assertions in tests/testthat/test-plot_museums.R (an R script):
test_that("plot_museums returns a ggplot object", {
result <- plot_museums()
expect_s3_class(result, "gg")
})
test_that("museum_visitors has expected columns", {
expect_true(all(c("auspraegung", "jahr", "monat", "wert") %in%
names(museum_visitors)))
})Run the tests in the R console:
devtools::test() # Ctrl+Shift+TPart 4: Add Functions for a Second Dataset
Munich Open Data publishes monthly statistics across many domains. Browse the catalogue here:
👉 https://opendata.muenchen.de/dataset/?tags=Monatszahlen
Choose a dataset that interests you — for example, monthly library loans, traffic counts, or another cultural indicator. You do not need to include the data in the package; a function that downloads and returns it on demand is fine.
Design and implement at least one new function for your chosen dataset. Your function should:
- Have a clear, descriptive name (use a verb:
get_,plot_,summarise_) - Accept at least one argument (e.g. a year range, a category filter, or a plot type)
- Be documented with a full roxygen2 block (
@param,@return,@export,@examples) - Be declared in
DESCRIPTIONif it uses external packages
Before writing the function body, write the call you wish existed. Ask: what does the user want to do? What do they need to specify? What should be returned? Once the interface feels right, filling in the body is easier (outside-in design from the lecture).
After writing your function:
- Document it — run
devtools::document()in the R console - Reload and try it — run
devtools::load_all()in the R console - Write at least one test — run
usethis::use_test("your-function-name")in the R console, then write assertions in the created R script - Run the full check — run
devtools::check()in the R console
Part 5 (Bonus Task): Vignette
A vignette is a long-form how-to guide that walks a new user through a complete workflow using your package. Unlike a help page (?function), a vignette tells a story.
Exercise 5.1: Create a vignette
Run in the R console:
usethis::use_vignette("munichvisitors")This creates vignettes/munichvisitors.Rmd. Write your vignette taking the perspective of someone who has never used the package before.
A good vignette covers:
- Why this package exists and what problem it solves
- How to install the package
- The main dataset: what it contains and how to access it
- The main function(s): what they do and how to call them
- At least one worked example with rendered output
Check out the dplyr vignettes for examples of well-written package vignettes. Notice how they lead with a motivating problem, not with function documentation.
Exercise 5.2: Build and preview
Run in the R console:
devtools::build_vignettes()
vignette("munichvisitors", package = "munichvisitors")Part 6: Reflection Log
Take a few minutes to add this week’s entry to your reflection log. Then commit and push.
Keyboard Shortcuts
These shortcuts cover the most common steps of the package development loop. Learn them — they will save you a lot of time.
| Action | Windows / Linux | Mac |
|---|---|---|
devtools::load_all() — reload the package |
Ctrl+Shift+L |
Cmd+Shift+L |
devtools::document() — regenerate help files |
Ctrl+Shift+D |
Cmd+Shift+D |
devtools::test() — run tests |
Ctrl+Shift+T |
Cmd+Shift+T |
devtools::check() — full R CMD check |
Ctrl+Shift+E |
Cmd+Shift+E |
Insert a pipe \|> |
Ctrl+Shift+M |
Cmd+Shift+M |
| Insert a roxygen2 skeleton | Code → Insert Roxygen Skeleton | Code → Insert Roxygen Skeleton |
Using LLMs for Package Development
LLMs are genuinely useful at several steps of package development — but they require careful review. From the lecture:
- Be specific about the interface — tell the LLM exactly what functions you want, what arguments they take, and what they return. Vague prompts produce vague APIs.
- Check
DESCRIPTION— LLMs often add unnecessaryImports. Every dependency is a liability your users inherit. - Check
NAMESPACE— missing@exporttags mean functions are silently unavailable; extra exports expose internal helpers. - Review code style — LLMs mix styles. Enforce consistency with
styler::style_pkg()after generation. - Run
devtools::check()— treat any NOTE or WARNING as a bug, not a suggestion.
Suggested workflow: write the interface yourself (function names, arguments, return values), then ask the LLM to fill in the body and generate the roxygen2 block. Review both carefully before committing.
Resources
R Package Development
- R Packages (2e) — Hadley Wickham & Jenny Bryan; the definitive reference
- Introduction to R Packages — N.J. Tierney — a step-by-step walkthrough used in preparing this practical
- usethis documentation
- devtools documentation
- roxygen2 documentation
Data sources used in this practical
- Munich Museums open data — the museums CSV
- Munich Open Data — Monatszahlen catalogue — further monthly statistics for Part 4
Testing
LLMs and package development
- JOSS guidance on AI-assisted code — how to acknowledge LLM usage in published software