Practical #3

Advanced Statistical Programming using R - Debugging

Author

Leonhard Kestel, Lisa Bondo Andersen, Cynthia Huang

Published

April 30, 2026

Quiz

Before starting, work through this QUIZ to check your understanding of the concepts covered in this week’s lecture on debugging and on using LLMs (large language models) in a statistical programming workflow.

General Remarks

Two practicals in, you now have a small but growing code base of things that can — and will — break. This session is about what to do when they do, plus your first hands-on session with an LLM assistant as part of the workflow.

A note on rAI learning space by aihorizon R&D.

We’ll be using the rAI learning space by aihorizon R&D. in the course — a web-based platform that gives you access to several state-of-the-art LLMs through one interface, including OpenAI’s GPT family (GPT-5.2, GPT-4o, o3-mini), Microsoft’s Phi and MAI models, and locally-hosted models. You can pick the model that fits each task.

We have arranged a free premium subscription for the class at least until the end of the semester. That’s free access to state-of-the-art models that would otherwise cost around €20/month each. Use it for this course, for other courses, or for personal projects — it’s yours to use however you like.
Using it is optional, but strongly suggested. We’ll use it in the lectures and practicals moving forwards.
There is a consent form and an intro survey in the platform when you first log in. Please read it before ticking through and complete the survey before using the platform.
If you’ve never used one of these tools before, don’t worry: we’ll walk you through the setup step by step, and most of the practical today is about learning to use it well for statistical programming.

Exercise 0: Set up the rAI learning space by aihorizon R&D.

It’s the first time you’re using the rAI learning space, and the habits you form here (system prompt, how you prompt, how you read output) will shape how useful the tool is to you for the rest of the course.

0.2 Configure your system prompt

In the platform’s Model Settings (on the left), find the System prompt field. A system prompt is a standing instruction that the model sees before every message you send — think of it as persistent preferences, not a one-off request.
Start from the template below and edit it to reflect how you actually work. At minimum, change the code style section to match your own preferences (e.g. %>% vs |>, tidyverse vs base vs data.table). If you have package preferences from Practical #2 (e.g. here for paths, readr over base read.csv), put them here.

Example system prompt (starting point — edit it)

You are helping me with an advanced statistical programming course in R.

Code style:
- Use the tidyverse (dplyr, tidyr, ggplot2, purrr) where possible.
- Use the native pipe |> rather than %>%.
- Follow the tidyverse style guide: snake_case, <- for assignment, two-space indent.
- Prefer small, pure functions over long scripts.
- For file paths use the `here` package.

When I ask for help debugging:
- First explain what the error message means, in plain English.
- Then suggest where to look in my code, BEFORE proposing a fix.
- Only write a full rewrite if I ask for one.

General:
- Do not apologise. Do not pad with filler. If you are uncertain, say so.
- If I paste in code, assume it is real code I care about — do not silently
  rename variables or change the pipe style.

Save the system prompt by clicking Model Settings again. Be aware that the System Prompt (or any of the other model settings) don’t save when you reload or close the page, so make sure to copy and paste it somewhere else.
BONUS: Find out what it means to change the temperature value (or any of the other model settings). Where would you like to put the temperature when e.g. writing code, proof-reading an email, creating a game for your next meeting with friends…?

0.3 Calibrate: does it follow your prompt?

Start a new conversation. Ask the model to generate a small function — something like:

Write an R function that takes a numeric vector and returns a tibble with the mean, median, and standard deviation. Include one worked example.
Read the reply carefully. Does it follow your style preferences? (Pipe style, naming, assignment operator.) If not, tweak your system prompt and try again. This calibration loop is important — a system prompt that the model ignores is worse than no system prompt, because it gives you false confidence.
When you’re satisfied with the output, save the session.

0.4 Revisit a previous conversation

Reload your page and navigate to the History tab (on the left)
Locate your previous session and click the Duplicate icon (next to the trash bin). This restores your conversation and you can continue writing. Note that the Model settings are still set to default, so these need to be updated manually.

0.5 A ground rule for the rest of the session

For every LLM interaction you have today, follow this protocol:
- Read the code yourself first (at least 30 seconds).
- Write down what you think is wrong before you prompt.
- Ask for an explanation, not a rewrite on the first turn.

This is not busywork — it’s the difference between using the LLM as a tool and being used by it.

Exercise 1: Using the LLM deliberately

The goal here is to practice how to prompt, not just whether to prompt. Work with the system prompt you configured in Exercise 0 and follow the protocol from 0.5.

Copy the following buggy snippet into a new R script. Do not run it yet.

library(palmerpenguins)
library(dplyr)

mean_mass_by <- function(data, group_var) {
  data |>
    group_by(group_var) |>
    summarise(mean_mass = mean(body_mass_g, na.rm = TRUE))
}

mean_mass_by(penguins, species)

Before asking the LLM anything, read the code carefully and write down (on paper or in a comment) what you expect it to do and what you think might go wrong. This 30-second pause is the single most effective debugging habit you can build.
Now run the code. Copy the full error message.
Go back to the platform, start a new conversation and check that your model settings are set as you want them to be. Ask the LLM to explain the error message, not to fix it. A good prompt template:

Here is my code: [code]. Here is the error: [error]. Please explain what R is complaining about, and point me to the line where the problem originates. Do not rewrite my code yet.
Once you understand why it fails, fix the function yourself. Then ask the LLM to review your fix.

Solution

This is a non-standard evaluation (NSE) bug. Inside the function, group_var is the literal string "group_var" — not the column species you passed in. Fix with the embrace operator { }:

mean_mass_by <- function(data, group_var) {
  data |>
    group_by({{ group_var }}) |>
    summarise(mean_mass = mean(body_mass_g, na.rm = TRUE))
}
mean_mass_by(penguins, species)

Reflect for a minute with your neighbour: what did the LLM get right? Did it hallucinate anything? Would you have found the bug faster without it?

Exercise 2: Locating errors in R code

When R throws an error, the message alone often isn’t enough — you need to know where in the call stack it came from.

Copy the following into a new R script and run it. You should get an error.

library(palmerpenguins)
library(dplyr)
 
summarise_species <- function(data) {
  data |>
    group_by(species) |>
    summarise(mean_mass = mean_body_mass(body_mass_g))
}
 
mean_body_mass <- function(x) {
  mean(x, na.rm = TREU)   # typo!
}
 
summarise_species(penguins)

Read the error message first. dplyr errors are short but structured — each line tells you something different. Before you touch anything else, work through it line by line: where is the error, and what is it?
Now run rlang::last_trace() (or traceback() if you prefer the older format). You will see a long stack — most of it is dplyr internals you can ignore. Scan bottom-up for frames that name your own functions. Which is the innermost frame that belongs to code you wrote?

Solution

Read the error message line by line:

Error in `summarise()`:
ℹ In argument: `mean_mass = mean_body_mass(body_mass_g)`.
ℹ In group 1: `species = Adelie`.
Caused by error in `mean_body_mass()`:
! object 'TREU' not found
Run `rlang::last_trace()` to see where the error occurred.

Each line adds one piece of information:

Error in summarise() — the dplyr verb where it surfaced.
In argument: ... — which argument of summarise() was being computed.
In group 1: species = Adelie — which group was being processed.
Caused by error in mean_body_mass() — your helper function.
object 'TREU' not found — the actual R-level problem.
Run rlang::last_trace() ... — dplyr tells you what to do next.

You already know the bug is in mean_body_mass() before running last_trace(). The backtrace just confirms the exact line.

The backtrace. rlang::last_trace() shows ~14 frames. Most are dplyr internals (summarise.grouped_df, summarise_cols, map, lapply, mask$eval_all_summarise, …) — safe to skip. The frames that matter are:

Frame 1: summarise_species(penguins) — your top-level call.
Frame 11: mean_body_mass(body_mass_g) — your function, innermost. This is where the bug lives.
Frames 12–14: mean() → mean.default() → isTRUE(na.rm) — base R tries to evaluate TREU and can’t find it.

Fix: replace TREU with TRUE on the na.rm line of mean_body_mass().

Lesson: the error message usually gets you close; the backtrace pins down the line. Scan for your own function names and stop at the innermost one.

Replace TREU with a value that does exist but makes no sense, e.g. na.rm = "banana". Re-run. How does the error message change? This is a classic “the error is not where you think” situation — the traceback still points to mean_body_mass, but the underlying message is now about argument types, coming from deeper inside mean.default().
Fix both bugs.

Exercise 3: `browser()`

browser() drops you into an interactive session inside a running function, so you can inspect variables at the moment things go wrong. The six commands you need are:

Command	Effect
`n`	Run the next line
`s`	Step into a function call on the current line
`f`	Finish the current loop / function
`c`	Continue until the next `browser()` or the end
`Q`	Quit the debugger
`where`	Print the call stack

Consider this buggy recursive factorial:

my_factorial <- function(n) {
  if (n == 1) return(1)
  return(n * my_factorial(n - 1))
}

my_factorial(5)   # returns 120 — correct
my_factorial(0)   # hangs / errors
my_factorial(3.5) # also wrong

Insert browser() as the first line of the function body and call my_factorial(0). Use n to step through. At each step, check the value of n. What is happening?

Solution

The base case is n == 1, but with n = 0 we never hit it — we recurse to -1, -2, … and either blow the stack or (with 3.5) never reach an integer. Fix by broadening the base case:

my_factorial <- function(n) {
  stopifnot(n >= 0, n == as.integer(n))
  if (n <= 1) return(1)
  return(n * my_factorial(n - 1))
}

Note that inserting a stopifnot() upfront is a debugging-prevention tool: it fails loudly on bad input instead of hanging silently.

Remove the browser() call once you have fixed the function.

Exercise 4: `debug()` and `debugonce()` (~10 min)

browser() requires editing the function body, which is inconvenient when the function lives in a package. debug() and debugonce() attach a debugger to a function from the outside.

debug(f) — enter the debugger on every subsequent call to f(), until you run undebug(f).
debugonce(f) — enter the debugger on the next call to f(), then detach automatically.

Take the following (silently) buggy function:

standardise <- function(x) {
  (x - mean(x)) / sd(x)
}

standardise(c(1, 2, 3, 4, 5))   # fine
standardise(c(1, 2, NA, 4, 5))  # returns all NAs — why?

Run debugonce(standardise) and then standardise(c(1, 2, NA, 4, 5)). Step through with n. At each line, print x, mean(x), and sd(x). Which value is the NA coming from?

Solution

mean(x) and sd(x) both return NA by default when x contains NA, which propagates to the result. Fix with na.rm = TRUE:

standardise <- function(x) {
  (x - mean(x, na.rm = TRUE)) / sd(x, na.rm = TRUE)
}

Note that debugonce() detaches itself after one call — try running standardise() again and you will not be dropped into the debugger. Compare with debug(standardise), which would keep re-entering until you call undebug(standardise).

When would you prefer debug() over debugonce()? Jot down one scenario.

Exercise 5: Debugging Quarto (~10 min)

Not every error comes from R. When quarto render fails, the first job is to work out which tool is complaining: R (your code), knitr (the engine that runs your code), Pandoc (the renderer), or LaTeX (only for PDF output). The console output usually tells you, but you have to read it carefully.

Create a new file broken.qmd in your course folder and paste in:

---
title: "A broken document"
format: html
execute:
  echo: true
---

## First chunk


::: {.cell}

```{.r .cell-code}
library(palmerpenguins)
head(penguins)
```
:::


## Second chunk


::: {.cell}

```{.r .cell-code}
penguins |>
  filter(species = "Adelie") |>
  ggplot(aes(x = bill_length_mm, y = bill_depth_mm)) +
  geom_point()
```
:::


## Third chunk


::: {.cell}

```{.r .cell-code}
1 + 1
```
:::

From the command line, run quarto render broken.qmd. You should see it fail. Before fixing anything, answer: does the error come from R, from knitr, or from Quarto/Pandoc? How can you tell?

Solution

There are three separate problems, each surfacing from a different layer of the stack. Chunks knit top-to-bottom, so you will encounter them in this order:

R code error in the second chunk: filter(species = "Adelie") uses = (argument assignment) instead of == (equality). dplyr raises an error like “Problem while computing ..1 = species = \"Adelie\"”. Fix: filter(species == "Adelie").
Missing library (still second chunk, revealed after you fix bug 1): ggplot2 is not loaded, so ggplot() is not found. Add library(ggplot2) (or library(tidyverse)).
Knitr chunk-option error in the third chunk: echo=TREU is evaluated by knitr before the chunk body runs. TREU is not a defined object, so knitr aborts with object 'TREU' not found. Fix: echo=TRUE.

The pedagogical point: fix one error, re-render, read the next error. The error message changes layer each time — a dplyr runtime error, then a namespace lookup error, then a knitr option-parse error. Learning to read which tool is complaining is half the skill.

Fix the errors one at a time, re-rendering after each fix. Notice how the error message changes as you peel back the layers.
Change format: html to format: pdf and try to render again. If you do not have a LaTeX distribution installed you will get a different class of error entirely, from LaTeX. Install tinytex::install_tinytex() only if you want to explore this; otherwise revert to html.

Exercise 6 (main activity): Create a debugging challenge (~10 min to start)

You will now create a small debugging challenge that a classmate will solve next week. This mirrors how bugs arrive in real life: someone else’s code, some context, and a vague sense that it doesn’t work.

Starting material: use the data-loading function and the ggplot2 wrapper function you wrote in Practical #2. They are realistic, non-trivial, and yours — exactly the kind of code that develops interesting bugs in the real world. You will break them on purpose.

Deliverable: a folder containing

A .qmd or .R file with exactly three bugs planted in working code.
A README.md describing what the code is supposed to do (but not revealing the bugs).
A solutions.md file, committed in a separate branch called solutions, which lists each bug, the line it is on, and one-sentence explanation of the fix.

Constraints on the bugs (one from each category):

Category	Examples
A syntax / parse error	typo in a keyword, unmatched brace, bad chunk option
A runtime error	wrong argument type, missing `na.rm`, NSE mistake (`{ }` forgotten in a wrapper), off-by-one loop bound
A silent / logical error	code runs, but returns the wrong answer (wrong variable, wrong grouping, `mean` instead of `median`, `=` vs `==`)

Constraints on the code:

Must use the palmerpenguins dataset (for continuity with Practicals #1–#2).
Must include at least one user-defined function — ideally one of yours from Practical #2 (the data-loader or the ggplot2 wrapper).
Must render (once fixed) as a Quarto HTML document.
Keep it short: ≤ 40 lines of code.

Sketch your three bugs on paper first. Then start from your working code from Practical #2 and introduce the bugs — don’t write buggy code from scratch, it’s too easy to accidentally make it unfixable.
Initialise a git repo in the folder, commit the buggy version on main, and commit the fixed version + solutions.md on a solutions branch. Push to GitHub.
You will only have time in class today to plan your bugs and get the repo set up. Finish the challenge before next week’s practical — upload the link to Moodle by Thursday evening.
In next week’s practical you will be assigned a classmate’s repo. You will have 30 minutes to find and fix all three bugs using the tools from this session (error messages, traceback(), browser(), debug(), and — deliberately — the LLM). The author will then walk you through the intended solution.

Making a good debugging challenge

The best challenges are realistic — bugs you might actually write — not obscure gotchas designed to humiliate your classmate. Aim for “oh, of course” moments, not “how was I supposed to know that” moments.

Template structure

my-debugging-challenge/
├── README.md          # context + what the code should do
├── analysis.qmd       # buggy code, on main branch
└── solutions.md       # on `solutions` branch only

A minimal README.md:

# Penguin bill-ratio analysis

This analysis computes the bill-length-to-bill-depth ratio for each penguin
species and plots the distribution. It is intended to render as an HTML
Quarto document.

There are exactly **three** bugs: one parse error, one runtime error,
one silent logical error. Find and fix all three.

Wrap-up discussion

With ~10 minutes to go, we will reconvene briefly to discuss:

Which debugging tool did you reach for first? Which did you reach for last? Why?
Where in the workflow did the LLM help you? Where did it mislead you?
Did anything about the rAI learning space change how you prompted, compared to how you’d normally use e.g. ChatGPT or Claude?

References

Shannon Pileggi, Debugging in R (NHS-R 2023 workshop) — slides this lecture drew on.
Jenny Bryan & Jim Hester, What They Forgot to Teach You About R, Chapter 11: Debugging R code.
rstats-wtf/wtf-debugging — worked examples, many adapted in this practical.
Hadley Wickham, R for Data Science (2e), Chapter 5: Workflow — getting help.

Quiz

General Remarks

Exercise 0: Set up the rAI learning space by aihorizon R&D.

0.1 Log in, consent, and survey

0.2 Configure your system prompt

0.3 Calibrate: does it follow your prompt?

0.4 Revisit a previous conversation

0.5 A ground rule for the rest of the session

Exercise 1: Using the LLM deliberately

Exercise 2: Locating errors in R code

Exercise 3: browser()

Exercise 4: debug() and debugonce() (~10 min)

Exercise 5: Debugging Quarto (~10 min)

Exercise 6 (main activity): Create a debugging challenge (~10 min to start)

Wrap-up discussion

References

Exercise 3: `browser()`

Exercise 4: `debug()` and `debugonce()` (~10 min)