StatProg2
  • Home
  • Syllabus
  • Group Project
  • Reflection Prompts
  • Setup

Practical #4

Advanced Statistical Programming using R — Version Control & Remotes

Author

Leonhard Kestel, Lisa Bondo Andersen, Cynthia Huang

Published

May 7, 2026

Quiz

Before starting, work through this QUIZ to check your understanding of the concepts covered in this week’s lecture on debugging and on using LLMs (large language models) in a statistical programming workflow.

General Remarks

Last week you practised finding and fixing bugs. This week the focus shifts to version control — keeping track of your work, understanding your project’s history, and connecting your local repository to GitHub.

The session builds from your local setup outward: first making sure your repository and .gitignore are in good shape, then connecting to GitHub and practicing the pull/fetch/push workflow you will use every day on the group project.

ImportantWhat you need before you start

Check you have all three of these before the practical begins:

  1. Git installed — run git --version in a terminal. You should see a version number.
  2. A GitHub account — if you don’t have one yet, sign up at github.com now.
  3. Your debugging exercise from last week — the .qmd or .R file(s) you worked on in Practical #3, ideally already in a git-tracked folder.

If any of these is missing, ask your instructor at the start and we will sort it out before you move on.

NoteGUI alternatives to the CLI

All exercises below are written CLI-first. If you prefer a graphical interface, two options exist:

GitHub Desktop (desktop.github.com) is the recommended GUI. It handles authentication automatically, covers all the workflows in this practical, and is generally more reliable than the RStudio Git pane. Use File → Add Local Repository to connect an existing folder.

RStudio Git pane — RStudio has a built-in Git tab (top-right panel) that covers staging, committing, pushing, and pulling. It is only available when working inside an R Project (a folder with a .Rproj file), and can be unreliable for anything beyond basic commits. Where it applies, it is noted as an optional alternative in the exercises below.

NoteGit concepts explained via Google Drive

If you have used Google Drive, you already understand the core problem Git solves: backing up files, going back to older versions, and sharing work with others. Git does all of this — but more deliberately. Instead of syncing automatically in the background, you decide exactly when to save a version and what to call it. That extra control is what makes it so useful for code and research.

The key difference: Drive saves continuously and silently. Git saves only when you run git commit, and you write a short message explaining what changed and why. Your history stays readable — not a pile of auto-saves labelled “version 47”.

Git workflow — local and remote Two-row diagram showing the local git workflow and the remote workflow with GitHub. LOCAL WORKFLOW Working directory Files as you edit them git add Staging area Changes queued to save git commit Local repository Snapshots on your machine git log · git checkout Key difference from Google Drive Drive saves automatically. Git saves only when you run git commit — and you name every save. WITH GITHUB Working dir Your files git add Staging Queued changes git commit Local repo Local history git push git pull GitHub Remote repository git fetch vs git pull fetch = download only · pull = download + merge Starting from GitHub? git clone = copy repo + configure remote in one step Google Drive comparison Working directory → Files in a local folder before uploading git add + staging area → Ticking files to upload git commit → Uploading with a label you wrote yourself Local repository → Drive version history — stored on your own machine GitHub → Google Drive — the cloud copy others can access git push / pull / clone → Syncing between your machine and Drive

Resource for the whole session: Happy Git with R (Jenny Bryan). Every task below has a corresponding section there if you want more detail.


Exercise 0: Quick review and setup check

In Practical #1 you set up a local git repository and learned init, add, and commit. GitHub was optional. This exercise gets everyone to the same starting point: a local repo with your Practical #3 work committed, ready to push to GitHub in Exercise 1.

0.1 Check your global git config

Open a terminal and run:

git config --global user.name
git config --global user.email

You should see your name and email. These appear in every commit you make. If they are blank or wrong, set them now — use the same email address as your GitHub account, so that GitHub can link your commits to your profile:

git config --global user.name "Your Name"
git config --global user.email "you@example.com"

0.2 Get your practical from last week into a local git repo

Navigate to your Practical #3 folder in the terminal:

cd path/to/your/practical3-folder

Then run these two diagnostic commands:

git log --oneline   # do you have any commits?
git remote -v       # is a GitHub remote configured?
Tip

If git log opens a scrollable view and your prompt doesn’t return, press q to exit.

Find your situation in the table below and follow the corresponding steps.

git log git remote -v Situation What to do
error / no commits no output No repo yet See A below
shows commits no output Local only (most likely) Nothing — move on to Exercise 1
shows commits shows a GitHub URL Already on GitHub Commit any unsaved changes, push, then skip to Exercise 2

A — No repo yet: initialise one now:

git init
git add .
git commit -m "add debugging exercise"
NoteIf your repo is in a messy state

If git status or git log produces errors, or you see mentions of merge conflicts or detached HEAD, the fastest fix is to start clean:

  1. Copy your .qmd/.R files somewhere safe outside the folder.
  2. Delete the .git folder: rm -rf .git
  3. Re-initialise: git init, git add ., git commit -m "initial commit"

This loses history, which is fine at this stage — the goal is to have something clean to push.


Exercise 1: Push to GitHub

1.1 Create a new repository on GitHub

  1. Go to github.com/new.
  2. Name the repository statprog-debugging (or similar).
  3. Set it to Public.
  4. Do not tick “Add a README” — your local folder already has content and an empty README would create a conflict on first push.
  5. Click Create repository.

GitHub will show you a page of instructions. You want the block labelled “…or push an existing repository from the command line”.

1.2 Set up SSH authentication

Before you can push, GitHub needs to verify your identity. We recommend SSH keys — once set up, you never need to paste a password again.

Follow the step-by-step instructions at:
lmu-osc.github.io/Introduction-RStudio-Git-GitHub/SSH.html

The process has three steps:

  1. Generate an SSH key pair on your machine.
  2. Add the public key to your GitHub account.
  3. Tell Git to use SSH when connecting to GitHub.
NoteHTTPS + Personal Access Token (PAT) as an alternative

If SSH does not work for you today, HTTPS with a PAT is a valid fallback:

  1. GitHub → Settings → Developer Settings → Personal access tokens → Tokens (classic) → Generate new token.
  2. Tick the repo scope. Set expiry to at least end of semester.
  3. Copy the token immediately (it won’t be shown again) and paste it when Git asks for a password.

SSH is still preferred for the rest of the course — set it up before Exercise 2 if you can.

1.3 Verify your SSH connection

Once the SSH key is added to GitHub, run this in your terminal to confirm everything works:

ssh -T git@github.com

A successful response looks like:

Hi YOUR-USERNAME! You've successfully authenticated, but GitHub does not provide shell access.

If you see Permission denied (publickey), go back through the LMU OSC instructions — the most common cause is that the public key was not pasted correctly into GitHub, or a passphrase was set and needs to be entered.

Tip

ssh -T git@github.com is read-only — it does not change anything, it only checks whether authentication succeeds. Safe to run at any time.

1.4 Connect your local repo to GitHub and push

Copy the commands GitHub shows you — they will look something like:

git remote add origin git@github.com:YOUR-USERNAME/statprog-debugging.git
git branch -M main
git push -u origin main

Run them in your terminal from inside your project folder.

Tip

git remote add origin <url> registers GitHub as the remote named origin. The -u flag in git push -u origin main sets origin/main as the default upstream, so from now on plain git push and git pull work without any extra arguments.

You can confirm the connection at any time:

git remote -v

You should see origin listed twice (fetch and push) pointing to your GitHub URL.

1.5 Verify

Refresh your GitHub page — your files should appear. Check that the commit message is what you expect.


Exercise 2: Working with .gitignore

Not every file in your project folder should be tracked by Git. Generated output, R session artefacts, and sensitive files like API keys should be excluded. The .gitignore file tells Git which files and patterns to ignore.

2.1 See what Git currently sees

git status

Look at the untracked files list. Are there any files you would not want on GitHub — for example .RData, .Rhistory, or a _files/ folder?

2.2 Create or edit your .gitignore

From inside your project folder, create the file if it does not exist yet and open it for editing:

touch .gitignore      # creates an empty file if it doesn't exist yet
nano .gitignore       # opens it for editing in the terminal

In nano: type or paste your content, then save with Ctrl+O → Enter, and exit with Ctrl+X.

NoteEditing in RStudio instead

You can also open .gitignore in RStudio via File → Open File. Note that RStudio’s file browser hides dotfiles (files starting with .) by default — if you can’t see .gitignore there, use the terminal to edit it instead.

Add the following as a starting point for an R / Quarto project. Data files are commonly listed here too — they are often large, change infrequently, and can usually be re-downloaded or regenerated, so there is little value in tracking them with Git:

# R session artefacts
.Rhistory
.RData
.Rproj.user/

# Quarto build output
/_site/
/.quarto/
*_files/

# OS noise
.DS_Store
Thumbs.db

Add any other files or patterns that appeared in git status and should not be committed.

NoteIgnore patterns — quick reference
Pattern What it matches
.RData A specific file by name
*.csv All files with that extension
data/ The entire data/ folder
!data/README.md Exception: track this one file even though data/ is ignored

A leading / anchors to the repo root: /_site/ only matches a top-level _site folder, not docs/_site/.

2.3 Verify that ignoring works

git status

Files matching your patterns should no longer appear in the untracked list.

2.4 Commit your .gitignore

git add .gitignore
git commit -m "add gitignore for R and Quarto"
git push
NoteWhat if I already committed a file I want to ignore?

Adding a pattern to .gitignore stops future tracking but does not remove the file from the index. To stop tracking a committed file:

git rm --cached path/to/file
git commit -m "stop tracking sensitive file"

--cached removes it from the index (and from GitHub after you push) without deleting it from your local disk. For sensitive files like API keys: once a secret is in git history, treat it as compromised and regenerate it.


Exercise 3: Exploring git history and restoring a past version

3.1 Browse the log

git log --oneline           # one line per commit
git log --oneline --graph   # with branch graph
git log -- myfile.qmd       # commits that touched one file

To inspect a specific commit:

git show <hash>                  # full diff for that commit
git show <hash>:practical3.qmd  # the file as it was at that commit
NoteOptional: RStudio History panel

If you are working inside an R Project, open the Git tab → click the History button (clock icon). Click any commit to see its diff. Click a file in the lower pane to see what changed in that file specifically — additions in green, deletions in red.

3.2 Make a change you will want to undo

Edit your .qmd — delete a section or change a heading. Stage and commit:

git add .
git commit -m "deliberately break something for exercise 3"

Run git log --oneline and note the short hash of this commit and the one before it.

3.3 Restore a past version of the file

git checkout <hash> -- practical3.qmd   # use the hash of the commit BEFORE the break

This stages the restored file automatically — you can see this with git status. Commit it:

git commit -m "restore practical3.qmd to pre-break version"
NoteOptional: RStudio History panel

In the History panel, click the commit before the break → select the file in the lower pane → Save As → overwrite the current file. This runs the same git checkout <hash> -- <file> command under the hood, but leaves the file unstaged — you still need to git add and git commit afterwards.

NoteOther restoration options

git revert <hash> — creates a new commit that exactly undoes a specific past commit. Safe for shared repos because it adds to history rather than rewriting it.

git diff <hash-B> <hash-A> | git apply — applies the diff between two commits as unstaged changes, useful when you want to review before committing.

3.4 Push

git push

Exercise 4: Clone, edit, and push

This exercise gets you comfortable with the clone → edit → commit → push cycle — the daily workflow once a repo already exists on GitHub.

  1. Get the SSH URL of your repository: green Code button on GitHub → SSH tab → copy.

  2. Navigate to a folder outside your existing project:

    cd ..

    cd .. moves you one level up in the folder structure. If you are not sure where you ended up, run pwd (Mac/Linux) or cd (Windows) to print your current location. Then clone:

    git clone git@github.com:YOUR-USERNAME/statprog-debugging.git statprog-copy
    cd statprog-copy
  3. Add a short comment to the top of your .qmd — something like # cloned copy — practical 4. Save.

  4. Stage, commit, and push:

    git add .
    git commit -m "add comment from practical 4"
    git push
    NoteOptional: RStudio UI

    Tick the file checkbox in the Git pane → Commit → type your message → Commit → Push (upward arrow). Requires an R Project in the cloned folder.

  5. Navigate back to your original folder and pull:

    cd ..
    cd 03-debugging    # or whatever your original folder is called
    git pull

    Your comment should now appear there too.

    NoteOptional: RStudio UI

    Click the Pull button (downward arrow) in the Git pane.


Exercise 5: fetch, inspect, then pull

git pull is convenient but it does two things at once: it fetches new commits from the remote and immediately merges them into your local branch. Sometimes you want to see what has changed before merging — especially on a shared repository. That is what git fetch is for.

  1. Make a change on GitHub via the web interface: go to your repository, click your .qmd, click the pencil icon, add a comment line at the top, and commit.

  2. Back in your terminal, fetch without merging:

    git fetch origin

    Your local files are unchanged. Git has downloaded the new commits but not applied them.

  3. Inspect what came in:

    git log HEAD..origin/main --oneline   # commits on remote that you don't have yet
    git diff HEAD origin/main             # line-by-line diff between your local and remote
  4. Now merge the fetched changes:

    git merge origin/main

    Or equivalently, git pull does fetch + merge in one step. Use fetch + merge when you want to review first; use pull when you trust the remote and just want to sync.

Tip

git pull is shorthand for git fetch origin followed by git merge origin/main. On a solo project the difference rarely matters. On a shared repository, fetching first gives you a chance to see what your collaborators did before those changes land in your working files.


Exercise 6: Reflection Log

  1. Take a few minutes to write this week’s reflection log.
  2. Commit and push your reflection log to GitHub.
TipWhat to write about

Some prompts: Did anything not work as expected? What was the most confusing part of connecting your local repo to GitHub? When do you think you will use git fetch instead of git pull?