Advanced Statistical Programming using R

Week 5: Quarto & Collaborative Coding

2026-05-13

Announcements

Reminders

  1. Commit into your individual git reflection logs
  2. Group formation
    • register your group (by May 20) on Moodle
    • If you submitted the group matching form, check your email for your group members.

Practical directly after lecture today!

The practical will be directly after the lecture today in the same room (S004).

Syllabus

Part 1: Statistical Programming Foundations (W2–6)

  • W02: Scripts, Functions & Refactoring
  • W03: Debugging
  • W04: Version Control & Remotes
  • W05: Quarto websites & Collaborative Coding
  • W06: R Packages

Group project

Upcoming tasks

  • Register your group by May 20!
    • Group member names & lmu campus emails
    • GitHub repo (set up this week in lecture/practical)
  • Proposal submission deadline extended to start of W8 (Mon Jun 8)
    • Review the project guidelines
    • Start searching for data sources, and designing your dataset
    • Set up your project repository structure
      • Include a proposal document & start drafting
      • (optional) Use the project template: our-statprog2-project

Suggested Repository Structure

penguins-project/                  ← repo root = RStudio project                                          
  ├── penguins-project.Rproj                                                                                
  ├── README.md                      ← dataset, research questions, members                                 
  ├── CONTRIBUTING.md                ← contribution statement + AI disclosure                               
  ├── _quarto.yml                    ← renders the whole repo as a Quarto website                         
  │                                                                                                         
  ├── data/                                                                                               
  │   ├── raw/                       ← read-only; never edited by scripts                                   
  │   │   ├── penguins.csv                                                                                
  │   │   └── LICENCE.txt            ← data licence documentation                                           
  │   └── processed/                 ← output of cleaning scripts                                           
  │       └── penguins_clean.csv                                                                            
  │                                                                                                         
  ├── code/                                                                                               
  │   ├── 01_download.R              ← optional: script to fetch raw data                                   
  │   ├── 02_clean.R                                                                                        
  │   ├── 03_eda.R
  │   └── 04_analysis.R                                                                                     
  │                                                                                                       
  ├── docs/                          ← rendered Quarto outputs (auto-generated)
  │                                                                                                         
  ├── proposal.qmd                   ← due Jun 8
  ├── report.qmd                     ← final output (or index.qmd for website)                              
  └── .gitignore                                                                        

Proposal document

Note

Due Mon 8 Jun — submit link to your rendered webpage (not Quarto document) on Moodle

Proposal: Data & EDA

1. Dataset & research questions

  • Source, licence, and structure of your data
  • 1–3 research questions: Descriptive, Exploratory, Explanatory/causal, or Predictive
  • Context and target audience

2. Initial data analysis

  • glimpse() and summary statistics
  • Two plots: distributions + a relationship/trend
  • Notes on data quality issues

Proposal: Plan & Code

3. Analysis plan

How do you plan to answer your research questions?

  • What variables will you use?
  • What analytical methods seem appropriate?
  • What data cleaning steps will you need to perform?
  • What are the main uncertainties or risks?

4. Workflow organisation

  • Code style: Which style guide? How will you enforce it (e.g. styler, shared AI system prompt)?
  • Packages: Which packages will you use? Will you use renv (W9) to lock versions?
  • Git workflow: How will you divide work across branches? What is your branching strategy?
  • .gitignore: What will you exclude (data files, rendered outputs, secrets)?

Tip

You do not need answers yet — show that you have thought carefully about what is feasible with your data.

Last Week

  • Git History — browsing and restoring past versions
  • Git Ignore — keeping secrets and noise out of your repo
  • Git Remotes & GitHub — fork, clone, push, pull

Git History

Browsing the log

Goal Command
Compact history git log --oneline
Commits on a file git log -- <file>
Full diff of a commit git show <hash>

Restoring a past version

# bring a file back to how it was at <hash>
git checkout <hash> -- code/analysis.R

# undo a whole commit (safe — adds a new commit)
git revert <hash>

Git Ignore & Remotes

.gitignore

  • Lists files/folders git should never track
  • One pattern per line (*.csv, _site/, .env)
  • Commit secrets early → they live in history forever

The remote loop

GitHub  ──clone──▶  local      (once)
GitHub  ──pull───▶  local      (start of session)
local   ──push───▶  GitHub     (end of session)

This Week

  • Quarto Review & Websites
  • Collaborative coding: branches, stash, worktrees
  • Pull requests & merge conflicts

Quarto Review

From StatProg1

Review: Quarto Document

A .qmd file has three components:

---
title: "My Document"
format: html
---

## Section heading

Some _italic_ and **bold** text.


::: {.cell}
::: {.cell-output-display}
![A nice plot](slides_files/figure-revealjs/fig-plot-1.png){#fig-plot width=960}
:::
:::
  • YAML header — document metadata and format options (--- delimiters)
  • Markdown content — narrative text, headings, lists, links
  • Code chunks — executable R (or Python) between ```{r} and ```

Chunk options go on #| lines at the top of each chunk.

Review: Markdown Syntax

Text formatting

# Heading 1
## Heading 2

_italics_, **bold**, `inline code`

[linked text](https://example.com)

Structures

- unordered list item
1. ordered list item

| Col A | Col B |
|-------|-------|
| 1     | 2     |

![Caption](image.png)

Tip

Switch between Source and Visual mode in RStudio with Ctrl+Shift+F4 — Visual mode renders markdown live.

Review: Quarto CLI

How rendering works

.qmd  →  knitr (R) / jupyter (Python)  →  .md  →  pandoc  →  HTML / PDF / docx

Render to final output

quarto render document.qmd           # default format
quarto render document.qmd --to pdf  # specify format
quarto render                        # render whole project

Live preview while authoring

quarto preview document.qmd   # opens browser, auto-reloads on save
quarto preview                # preview whole project

Tip

Use quarto preview while writing — it re-renders on every save so you see changes instantly.

Quarto Websites

Webpage vs. Website

Webpage
report.html
a single HTML document
vs.
Website
HomeReportAbout
footer
multiple pages, shared nav & footer

What is the difference?

  • A webpage is a single document written in HTML.
  • While a website is a collection of webpages where it usually share a common navigation bar (or tab), and possibly a common footer.

Why Quarto Websites?

For your analysis

  • Literate programming — prose, code, and output live in one document
  • Reproducibility — anyone can re-run the analysis from source
  • Version control friendly — plain text .qmd files diff and merge cleanly

For your audience

  • One URL — proposal, report, and data documentation all in one place
  • Shareable — no need to email files; link to the GitHub Pages site
  • Professional — looks like a real project, not a collection of scripts

Quarto projects

Quarto projects are directories that provide:

  • Renders all (or some) .qmd files with one command: quarto render
  • Shares YAML options across all documents
  • Controls where rendered output goes (output-dir)
  • Can freeze rendered output to avoid re-executing unchanged files

Tip

Quarto websites are a special type of Quarto project

_quarto.yml
project:
  type: website     # website, book, or default
  output-dir: docs  # where rendered files go

Anatomy of a Quarto Website

my-project/
├── _quarto.yml        ← project + website config
├── index.qmd          ← home page
├── report.qmd
├── data/
│   └── penguins.csv
├── _freeze/           ← cached computation results
└── docs/              ← rendered website (output-dir)
    ├── index.html
    ├── report.html
    └── site_libs/
_quarto.yml
project:
  type: website
  output-dir: docs
  render:
    - "index.qmd"
    - "report.qmd"

website:
  title: "Penguins Project"
  navbar:
    left:
      - href: index.qmd
        text: Home
      - href: report.qmd
        text: Report

format:
  html:
    toc: true
    code-fold: true

Creating a Quarto Website

There are three ways to start a Quarto website project:

  1. via CLI: quarto create project > website
  2. via RStudio: New Project > … > Quarto Website
  3. adding a _quarto.yml to an existing repository

Rendering Websites

Render to output-dir

quarto render              # render whole project

Output goes to docs/ (or whatever output-dir is set to).

Live preview while authoring

quarto preview   # opens browser, auto-reloads on save

Tip

Use quarto preview while writing — re-renders on every save.

Freeze — skip re-running code that hasn’t changed:

_quarto.yml
execute:
  freeze: auto    # re-run only if source changed

Customising Websites

Navigation under website: in _quarto.yml:

website:
  title: "My Project"
  navbar:
    left:
      - href: index.qmd
        text: Home
      - href: report.qmd
        text: Report

Extra website options under website::

website:
  search: true           # adds a search box to the navbar
  announcement:
    content: "Proposal due **8 Jun**"
    type: warning        # info, warning, or danger
    dismissable: true
  page-footer:
    center: "StatProg2 — LMU Munich"

See quarto.org/docs/websites/ and the Website Options Reference for the full list.

Sharing YAML options

You can share YAML options between files in the same project by specifying them in _quarto.yml. For example, you might want to set some project level code display and execution options:

Code display — under format: html:

_quarto.yml
format:
  html:
    code-fold: true        # collapse code into <details>
    code-summary: "Show code"
    code-overflow: scroll  # or wrap
    code-line-numbers: true
    code-copy: hover       # copy button on hover
    code-tools: true       # show/hide all code menu

Execution — under execute:

_quarto.yml
execute:
  echo: true       # show source code
  eval: true       # run code
  output: true     # include results
  warning: false   # suppress warnings
  error: false     # halt on errors
  cache: false     # don't cache by default
  freeze: auto     # re-run only if source changes

Basics of Publishing Websites

A website is just files

  • HTML, CSS, images, JavaScript files etc.
  • quarto render produces website files — e.g. in _site/ or docs/
  • On your laptop, quarto preview serves them locally at http://localhost:NNNN

To reach the world, you need a host

  • A server that is always on and reachable over the internet
  • A URL that points to it
  • The server delivers your files when someone visits the URL

GitHub Pages

GitHub pages is a free hosting service — it serves the contents of a branch (or folder) in your repo as a website at https://<user>.github.io/<repo>/

Quarto Publish to GitHub Pages

What happens under the hood

  • Quarto renders the site locally
  • Creates (or updates) a gh-pages branch
  • Pushes only the rendered output there
  • GitHub Pages serves from gh-pages automatically
  • Your source .qmd files stay on maingh-pages only ever contains rendered HTML.

Demo: Fork & Publish Group Project Website

  1. Fork the template repo: https://github.com/soda-lmu/our-statprog2-project
  2. Disconnect fork (via Settings > General > Danger Zone > Leave Fork Network)
  3. Clone locally
  4. Run quarto publish
  5. Set GitHub Pages source branch (via Settings > GitHub Pages > Build & Deploy)

Extension: Netlify Drag & Drop

Drag & drop deploy

  1. Go to app.netlify.com/drop
  2. Drag your rendered site folder (containing index.html) onto the page
  3. Create an account or log in to change the domain name: Site settings > Change site name

Or publish from the terminal:

quarto publish netlify

Parallel histories in Git

  • Git Branches
  • Git Stash
  • Git Worktree

Git Branches: Parallel Timelines

The metaphor

A branch is like writing two drafts of a chapter at the same time.

  • main is your published, working draft
  • A feature branch is a new draft forked at a specific page
  • You can switch between drafts at any time
  • When happy, you merge the feature draft back into main

The reality

A branch is just a named pointer to a commit.

  • Creating one is instant — no file copying
  • Switching is instant — git swaps your working directory
  • Deleting is safe once merged — the commits remain

Branches: Feature Workflow

Scenario: Adding a species filter without breaking working code on main.

gitGraph
   commit id: "Initial commit"
   commit id: "Add data cleaning"
   branch feature-penguin-filter
   checkout feature-penguin-filter
   commit id: "Add species filter"
   commit id: "Test filter logic"
   checkout main
   commit id: "Fix README typo"
  %%  merge feature-penguin-filter id: "Merge filter"

Tip

Name branches after what they do: add-penguin-filter, fix-missing-values, week4-reflection.

Branch Commands

Create & switch

git branch <name>        # create a branch
git switch <name>        # switch to it
git switch -c <name>     # create and switch (shortcut)

Inspect & tidy up

git branch               # list local branches
git branch -v            # with last commit message
git branch -d <name>     # delete (only if merged)
git branch -D <name>     # force-delete

Merging: Combining Histories

The metaphor

Merging is like combining two edited drafts of the same chapter.

  • Git compares both drafts to the common ancestor (the point they split)
  • Lines only one branch changed → accepted automatically
  • Lines both branches changed differently → conflict (you decide)

Two kinds of merge

  • Fast-forwardmain hasn’t moved since the branch split; git just advances the pointer, no merge commit needed
  • Three-way merge — both branches have new commits; git creates a new merge commit joining the two histories

Merge: Fast Forward

Scenario: feature-penguin-filter is finished; and we didn’t edit main!

gitGraph
   commit id: "Initial commit"
   commit id: "Add data cleaning"
   branch feature-penguin-filter
   checkout feature-penguin-filter
   commit id: "Add species filter"
   commit id: "Test filter logic"
   checkout main
   commit id: "Fix README typo" type: REVERSE
   merge feature-penguin-filter id: "Fast-forward ✓"
   commit id: "Continue on main"

Merge: Feature Branch Workflow

Scenario: feature-penguin-filter is finished; bring it into main via Three-way merge.

gitGraph
   commit id: "Initial commit"
   commit id: "Add data cleaning"
   branch feature-penguin-filter
   checkout feature-penguin-filter
   commit id: "Add species filter"
   commit id: "Test filter logic"
   checkout main
   commit id: "Fix README typo"
   merge feature-penguin-filter id: "Merge filter"

main now contains every commit from both lines.

Merge Commands

Merge a branch

git switch main                    # be on the target branch
git merge feature-penguin-filter           # merge into current branch
git merge --no-ff feature-penguin-filter   # always create a merge commit
git merge --abort                          # cancel a merge in progress

Inspect merges

git log --oneline --graph              # visualise the branch history
git log --merges                       # show only merge commits
git branch -d feature-penguin-filter   # delete branch after merging

Tip

Always be on the target branch (e.g. main) before running git merge.

Git Stash: The Desk Drawer

The metaphor

You’re mid-task at your desk when your supervisor drops by with an urgent request.

  • You can’t commit half-done work
  • So you sweep everything into a desk drawer (stash)
  • Your desk is clear — you handle the interruption
  • When they leave, you pull everything back out exactly as it was

The gotcha

  • You can have multiple stashes — like multiple drawers
  • Unlabelled stashes are easy to forget
  • Use git stash push -m "description" to label them
  • git stash pop restores and removes the top stash
  • git stash apply restores but keeps it in the list

Stash: Mid-Task Branch Switch

Scenario: Half-way through a feature when a bug is reported on main.

feature-branch
uncommitted edits
git stash
stash
edits saved
git switch main
main
fix + commit
git switch feature-branch
git stash pop
feature-branch
edits restored

Stash Commands

Save & restore

git stash                          # stash all uncommitted changes
git stash push -m "my message"    # stash with a label
git stash pop                      # restore + remove top stash
git stash apply stash@{1}          # restore without removing

Inspect & clean up

git stash list               # show all stashes
git stash show stash@{0}     # diff of a stash
git stash drop stash@{0}     # delete one stash
git stash clear              # delete all stashes ⚠️

Tip

Always label stashes with -m — a stack of anonymous stashes is surprisingly easy to lose track of.

Git Worktree: Two Desks, One Filing Cabinet

The metaphor

Normally you have one desk (working directory) per repo.

git worktree sets up a second desk in a different folder — each desk has a different branch checked out, but both share the same filing cabinet (.git).

When stash isn’t enough

  • You need both branches open at once to compare outputs
  • The interruption will take hours, not minutes
  • You want to build or test a branch without touching your current work

Worktree: Hotfix While Feature Is Open

Scenario: feature-eda is half-done in ~/project/; an urgent fix is needed on main.

.git  (shared)
← checkout
~/project/
feature-eda branch
EDA work in progress
worktree add →
~/project-hotfix/
main branch
hotfix committed ✓

Both folders are live — edit one without touching the other.

Worktree Commands

Set up & use

# check out an existing branch in a new folder
git worktree add ../project-hotfix main

# check out a new branch in a new folder
git worktree add -b hotfix-123 ../project-hotfix main

Inspect & clean up

git worktree list                      # show all active worktrees
git worktree remove ../project-hotfix  # remove when done
git worktree prune                     # clean up stale entries

Tip

For most day-to-day interruptions, stash is enough. Worktrees shine when you need two branches open for an extended time.

Merging local & remote branches

Review: Remote commands

The flow

GitHub  ──clone──▶  local
GitHub  ◀──push───  local
GitHub  ──pull──▶   local

Connect & inspect

git remote -v                  # list remotes
git remote add origin <url>    # connect to GitHub

Download

git clone <url>    # copy a repo to your machine
git fetch          # download changes, don't merge
git pull           # fetch + merge

Upload

git push                    # push to tracked branch
git push -u origin main     # push & set upstream

Merge Conflicts: Two Authors, One Sentence

The metaphor

Two co-authors both rewrote the same sentence in their own copies — now you need to decide which version (or a blend) to keep.

  • Git marks the collision with conflict markers in the file
  • Nothing is lost — both versions are shown
  • You make the editorial call, then tell git you’re done

When they happen

  • Two branches both edited the same lines of the same file
  • A git pull brings in remote changes that clash with local edits
  • A git merge joins histories with overlapping changes

Local vs. Remote Conflicts

Conflicts happen when the same lines are changed in two places before syncing:

Edit on GitHub.com (online editor or Codespaces) while you also have local edits — git pull will flag a conflict.

gitGraph
   commit id: "shared base"
   branch origin/main
   checkout origin/main
   commit id: "edit on GitHub.com"
   checkout main
   commit id: "local edit"
   merge origin/main id: "CONFLICT ⚠️" type: HIGHLIGHT

Resolution is the same:

  1. git pull — git downloads and tries to merge
  2. Open the conflicting file — look for <<<<<<< markers
  3. Edit to keep the right version
  4. git add <file> then git commit

Demo: Local & Remote Changes

What happens when you?

  1. Navigate to the forked group project repo.
  2. Modify README.md on Github.com
  3. Commit changes directly to main
  4. Open your local repo in RStudio
  5. Make a different edit to README.md
  6. Commit changes directly to main
  7. Pull remote changes using git pull

Tip

What is different between the two “Commit changes directly to main steps?”

Viewing a Merge Conflict

Git inserts conflict markers directly into the file:

<<<<<<< HEAD
result <- mean(x, na.rm = TRUE)
=======
result <- median(x, na.rm = TRUE)
>>>>>>> feature-branch
  • <<<<<<< HEAD — your version (current branch)
  • ======= — divider between the two versions
  • >>>>>>> feature-branch — incoming version

Tip

In RStudio: open the file in the editor — conflict regions are highlighted with accept/reject buttons.

Resolving Merge Conflicts

  1. Open the conflicting file
  2. Choose a version — or write a combined version
  3. Delete all three marker lines (<<<<<<<, =======, >>>>>>>)
  4. git add <file> — mark as resolved
  5. git commit — complete the merge

Tip

Small, focused commits and frequent pulls keep conflicts small and easier to resolve.

Collaborative branching

Conflicts in Groups

Scenario: Two teammates both edit report.qmd directly on main at the same time.

gitGraph
   commit id: "shared base"
   branch teammate
   checkout teammate
   commit id: "adds plot section"
   checkout main
   commit id: "rewrites intro"
   merge teammate id: "CONFLICT ⚠️" type: HIGHLIGHT

Tip

Avoid mega-documents. The project template splits work into separate files (02_clean.R, 03_eda.R, report.qmd) — teammates own different files and conflicts become rare.

Branches for group work

DO

  • Give each feature or task its own branch
  • Keep main clean — only merge finished, working code
  • Each person works on their own branch
  • Delete branches once merged

DON’T

  • Let branches drift for weeks without merging
  • All work directly on main at the same time
  • Make one giant branch with unrelated changes
  • Ignore merge conflicts and force-push

Tip

Branching friction is productive — it forces you to think in small, independent tasks.

Pull requests keep parallel work organised

In a group project, multiple people work on different parts at the same time — pull requests are the mechanism for bringing that work together safely.

Without PRs

  • Everyone pushes directly to main
  • Changes land without review
  • Hard to tell who changed what, or why
  • One bad push breaks everyone’s work

With PRs

  • Each task lives on its own branch
  • Changes are reviewed before merging
  • Conflicts are caught early, in isolation
  • main stays stable and deployable

Tip

Think of a PR as a conversation: “here’s what I did, and here’s why” — it creates a permanent record alongside the code.

The pull request workflow

gitGraph
   commit id: "existing work"
   branch my-feature
   checkout my-feature
   commit id: "add EDA plot"
   commit id: "clean up code"
   checkout main
   merge my-feature id: "PR merged ✓"
   commit id: "continue on main"

  1. Create a local branch: git switch -c my-feature
  2. Make commits on that branch
  3. Push: git push -u origin my-feature
  4. Open a Pull Request on GitHub
  5. Collaborators review and comment
  6. Merge when approved

Pushing branches to GitHub

In order to open a pull request, you need to push your local branch to GitHub:

git push -u origin my-feature     # push and set upstream tracking
git push                           # subsequent pushes (upstream already set)

The -u flag links your local branch to origin/my-feature — after that, plain git push and git pull know where to go.

Tip

After pushing, GitHub shows a banner: “Compare & pull request” — click it to open a PR.

Summary

Quarto Websites

  • A Quarto project is a folder with _quarto.yml — it renders all .qmd files and shares options across them
  • A Quarto website adds shared navigation, search, and a footer; deploy to GitHub Pages with quarto publish gh-pages
  • Share code display options (code-fold, code-line-numbers) under format: html: and execution options (echo, freeze) under execute: in _quarto.yml

Quarto & Project Structure

  • A reproducible project keeps raw data read-only, separates code from outputs, and documents everything in .qmd files
  • _quarto.yml renders the whole repo as a website — proposal, report, and README in one place
  • Render early, render often — catch broken code before submission

Collaborative Coding with Git

Branches, stash & worktrees

  • Work on features in isolation — git switch -c <name>
  • Stash unfinished work to switch context — git stash / git stash pop
  • Use git worktree when you need two branches open at once

Pull requests & conflicts

  • Push your branch with git push -u origin <name>, then open a PR on GitHub
  • Collaborators review, comment, and approve before merging
  • Resolve conflicts by editing the <<<<<<< markers, git add, then git commit

Tip

Pull at the start of every session, push at the end — this keeps conflicts small.