gitGraph: commit id: "A" commit id: "B: remove code/analysis.R" commit id: "C" commit id: "D (HEAD)" commit id: "git checkout A code/analysis.R" type: HIGHLIGHT commit id: "E: add analysis back"
Week 4: Version Control & Collaborative Coding
2026-05-07
Group formation activity
Date is now July 29 Jul 30.
Oral exam will:
More details to be provided closer to the examination date.
:In the hands-on project, you will:
Your dataset:
nycflights13)At least one data table should contain:
y (not binary 0/1),x_1 (can be time)x_2Tip
Keep a log of data sources you explore! Even datasets you don’t use can be valuable discussion examples in the Oral Exam.
Part 1: Statistical Programming Foundations (W2–6)
Three condition types
| Type | What to do | |
|---|---|---|
| 🚨 | Error | Must fix |
| ⚠️ | Warning | Inspect output |
| 💬 | Message | Read it |
Troubleshooting steps
Minimal
Reproducible
library() callsdput() for real dataset.seed() if randomness is involvedTip
Writing a reprex often finds the bug for you — making code self-contained reveals missing library() calls or stale variables.
| Goal | Tool |
|---|---|
| Locate where error occurred | traceback(), rlang::last_trace() |
| Pause and inspect inside function | browser() |
| Debug a package function | debugonce(fn), debug(fn) |
| Debug outside R (YAML, paths) | quarto render in terminal |
Motivation, commands so far, and how git works
The file naming trap
analysis_final.R
analysis_final_v2.R
analysis_FINAL_submit.R
analysis_FINAL_submit_fixed.R
analysis_use_this_one.R
The “what did I change?” problem
Tip
Version control replaces both problems with one system: a named, timestamped history of every change.
For your own work
For collaboration
working directory → staging area → repository
(your files) (git add) (git commit)
git add)git commit)Set up & inspect
Stage & commit
Examples based on:
Tip
The RStudio Git History panel calls git log under the hood — you can also run git log --oneline in the terminal for a compact view.
Tip
Changes are shown as highlights: red for deletions and green for additions
Tip
Rstudio is displaying the results of: `git checkout
Browse the log
Imagine you have commits A → B → C → D, where B removes a file. At D, you want to recover the missing files.
1. Checkout and stage the old file.
gitGraph: commit id: "A" commit id: "B: remove code/analysis.R" commit id: "C" commit id: "D (HEAD)" commit id: "git checkout A code/analysis.R" type: HIGHLIGHT commit id: "E: add analysis back"
Staged — still need git commit, or you can un-stage the file (e.g. for more editing) with git restore --staged <file>
2. Apply patch to working directory
gitGraph: commit id: "A" commit id: "B: remove code/analysis.R" commit id: "C" commit id: "D (HEAD)" commit id: "git diff B A | git apply" type: HIGHLIGHT commit id: "E: add analysis back"
Unstaged — still need git add + git commit
3. Revert the offending commit`
git revert <hash>
gitGraph: commit id: "A" commit id: "B: remove code/analysis.R" type: REVERSE commit id: "C" commit id: "D (HEAD)" commit id: "git revert B" type: HIGHLIGHT commit id: "E: add analysis back"
Auto-commits — done!
If we use the ‘Save As’ button in RStudio to restore the file, which option are we in?
git diff <B> <A>| git applygit checkout <A> -- <file>git revert <B>
Too large or auto-generated
_site/, docs/).o, .so)Machine-specific or sensitive
.Rhistory, .RData, .Rproj.user/.DS_Store (macOS), Thumbs.db (Windows).env filesWarning
Once a file is committed, its content lives in git history — even if you delete it later. Never commit secrets.
.gitignore — project levelA .gitignore file at the repo root tells git which files and folders to skip.
Syntax
# ignore a specific file
.RData
# ignore all files with this extension
*.csv
# ignore a whole folder
_site/
# but track one file inside it
!_site/CNAME
Typical R / Quarto .gitignore
# R session artefacts
.Rhistory
.RData
.Rproj.user/
# Quarto build output
/_site/
/.quarto/
# OS noise
.DS_Store
Thumbs.db
Tip
GitHub offers a ready-made R .gitignore when you create a new repo — always tick that box.
.gitignore — machine levelSome files you want to ignore everywhere, not just in one project.
Add OS and editor noise once, never think about it again:
# macOS
.DS_Store
.AppleDouble
# Windows
Thumbs.db
Desktop.ini
# RStudio
.Rproj.user/
Git stores snapshots, not diffs.
Each commit records:
commit c3d4e5f
parent b2c3d4e
author Cynthia Huang
date 2026-05-07
Add penguin species filter
↓
[snapshot of all files]
You don’t need to know how git works under the hood for this unit, but if you are curious:
So far, everything has lived on your machine:
working directory → staging area → local repository
(your files) (git add) (git commit)
What if you want to share your work with others? or work on multiple machines?
A remote repository is a related repository hosted on a server:
working directory → staging area → local repo → remote repo
(your files) (git add) (git commit) (git push)
↑
(git pull)
Tip
GitLab works the same way and is permitted in this course, but practicals and instructions will use GitHub.
You need a GitHub account
Authentication = proving you are who you say you are
GitHub needs to verify your identity before you can connect:
Tip
We recommend using SSH using these instructions from LMU OSC. The full setup walkthrough is in this week’s practical.
A fork is your own copy of someone else’s repository — on GitHub’s servers.
original repo your fork
(soda-lmu/template) → (you/template)
↓
your machine (git clone)
Changes you make stay in your fork until you open a Pull Request to propose them back.
A clone is a local copy of a GitHub repository — on your machine.
GitHub repo
(you/template)
↓
your machine (git clone)
Changes you make stay local until you git push them back to GitHub.
| Fork | Clone | |
|---|---|---|
| Lives on | GitHub | your machine |
| Connected to | original repo | a GitHub repo |
| Used for | contributing to others’ work | working locally |
Tip
For the group project: one member forks the template, the rest clone the fork. Everyone pushes to the same shared fork.
Navigate first to the repository you want to fork – e.g.:
🔗 github.com/soda-lmu/our-statprog2-project
Give your fork a name
Forking in progress
Notice the connection to the original repository
Clone to your machine – automatically sets the remote.
git clone <url> # copy a repo to your machine
Getting the URL
Clone to your machine – automatically sets the remote.
git clone <url> # copy a repo to your machine
The day-to-day loop
GitHub ──clone──▶ local (once, to set up)
GitHub ──pull───▶ local (start of each session)
local ──push───▶ GitHub (end of each session)
git pull — get your collaborators’ latest commitsgit add + git commit — save your work in logical chunksgit push — share your commits with the teamTip
Pull at the start of every session, push at the end. This keeps conflicts small and your team in sync.
What CLI commands is RStudio performing for you?
Let’s say you have a repository on your local machine, and you want to git push it to GitHub
You can’t push without a remote!

So we need to set up a remote repository to connect to!
GitHub shows you exactly what to run after creating an empty repo:
Tip
-u sets origin/main as the upstream — after this, plain git push works.git remote -v confirms the connection — you should see origin pointing to your GitHub URL.The GitHub CLI (gh) lets you create repos without leaving the terminal.
init, add, commit — the core loop; each commit is a snapshot with a parent pointergit log) to browse and restore past versions.gitignoreIf you already have a local repo
-u sets origin/main as the upstream — after this, plain git push works.
Tip
git remote -v confirms the connection — you should see origin pointing to your GitHub URL.
git pull — fetch and merge any remote changes firstgit push — send your commits to GitHubTip
Make it a habit: pull first, then push. This avoids most “rejected push” errors.