StatProg2
  • Home
  • Syllabus
  • Group Project
  • Reflection Prompts
  • Setup

On this page

  • Quiz
  • General Remarks
  • Exercise 0: Set up your tools
  • Exercise 1: Command line basics
  • Exercise 2: Relative paths
  • Exercise 3: git repositories
  • Exercise 4: Descriptive statistics and visualisation
  • References

Practical #1

Advanced Statistical Programming using R - Introduction

Author

Leonhard Kestel, Lisa Bondo Andersen, Cynthia Huang

Published

April 17, 2026

Quiz

Before starting, work through this QUIZ to check your understanding of the concepts covered in the lecture, as well as preliminary knowledge about R.

General Remarks

In the first practical session, we will set up all the tools that will be used in this course and practice some basic skills and recap some of the skills from StatProg 1. The learning objectives for this session are:

  1. Set up all the tools that will be used in this course:
    • R, Rstudio, tidyverse installation
    • git, GitHub
    • Quarto (via R, and command line)
    • Command line
  2. Basic usage of these tools, and how they fit into a data science workflow
    • Rendering quarto documents
    • Basic command line operations
    • Work with Git repositories
    • Work with (relative) paths
  3. Recap of skills from StatProg 1
    • Data wrangling (dplyr)
    • Data visualisation (ggplot2)

Exercise 0: Set up your tools

If you successfully finished the “Introduction to Statistical Software for the Statistics Minor” class, you should already have all the tools installed. If not, please follow the instructions below to set up your tools for this course.

  1. Install R and Rstudio
  2. Install the tidyverse packages in R:
install.packages("tidyverse")
  1. Install Git
  2. Set up your GitHub account
  3. Install Quarto

Exercise 1: Command line basics

  1. Open a command line. You can either do this via Rstudio (Terminal tab) or via your operating system’s terminal application (e.g. Terminal on Mac, Command Prompt or PowerShell on Windows).

If you are new to the command line, here are some basic commands to get you started:

Description Windows Mac/Linux Example
Change directory cd <target-directory>1 cd <target-directory> cd ~/Downloads opens the Downloads folder
Display current directory cd2 pwd3 pwd displays your current location in the file system
List files in directory dir <target-directory>4 ls <target-directory>5 ls ~/Downloads lists files in the current directory
Create a new directory mkdir <directory-name>6 mkdir <directory-name> mkdir my-new-folder creates a new folder called “my-new-folder” in the current directory
Remove a directory rmdir <directory-name>7 rmdir <directory-name> rmdir my-new-folder removes the folder “my-new-folder” (only if it is empty)
Create a new file type nul > <file-name> touch <file-name> touch my-file.txt creates a new file called “my-file.txt” in the current directory
Display file contents type <file-name> cat <file-name> cat my-file.txt displays the contents of the file “my-file.txt”
Remove a file del <file-name>8 rm <file-name>9 rm my-file.txt removes the file “my-file.txt”
TipGetting help with CLI commands

On Mac/Linux, man <command> opens the manual page for a command (e.g. man ls). Press q to exit. Most commands also accept a --help flag (e.g. ls --help) for a shorter summary. On Windows, use <command> /? (e.g. dir /?).

  1. Using the command line, navigate to your home directory, create a new folder (directory) for this course and navigate into it.

  2. Using the command line, create a new file called hello-world.qmd in that folder

  3. Display the contents of that file in the command line (it should be empty). Add some content to the file (e.g. “Hello, world!”) using RStudio. Display the contents of the file again in the command line to confirm that your changes were saved.

  4. Render the quarto file using the quarto render command. The command creates an HTML file, which you can display in your browser.

  5. You can also render the document using the “Render” button in Rstudio.

Exercise 2: Relative paths

For a deep dive into relative paths and project handling in R, see R4DS Chapter 6 and Q4S Chapter 4.

  1. Save this image in your course folder and load it in your quarto document using a relative path alt text
  • In Markdown images are included like this:
![alt text](./<image-name>.png)
  • In R you can include images using knitr::include_graphics() like this:
knitr::include_graphics("./<image-name>.png")
  1. Move the image to a new folder (e.g. create a new folder called “images” and move the image there) and update the path in your quarto document accordingly.10

  2. Move the image outside of your course folder (e.g. to your desktop) and try to load it again using a relative path.11

  3. (Optional) Install the here package (see Q4S Chapter 4.7) and use it to load the image in your quarto document. The here package provides a simple way to construct file paths relative to the root of your project, which can make your code more robust12 and easier to read.

Exercise 3: git repositories

Now, we will set up a git repository for this course and practice some basic git commands. There is a very handy cheat sheet for git commands available here. Take a look at the first three sections of the cheat sheet (Getting Started, Preparing to Commit, and Making Commits).

  1. Initialize a new git repository in your course folder using the command line git init.

  2. Create a new file called README.md in your course folder and add some content to it (e.g. “This is my course repository for StatProg 2”). Save the file.13

NoteSolution

You can create the file and add content using RStudio, or you can do it directly in the command line using the echo command:

echo "This is my course repository for StatProg 2" > README.md
  1. Use the git status command to see the status of your repository. You should see that the README.md file is untracked (i.e. it is not yet being tracked by git).

  2. Use the git add command to stage the README.md file for commit. Then use the git status command again to see that the file is now staged.

NoteSolution
git add README.md
git status
  1. Use the git commit command to commit the staged file. You will be prompted to enter a commit message.
NoteSolution

You can also provide the commit message directly in the command line using the -m flag:

git commit -m "Add README file"
  1. Use the git log command to see the commit history of your repository. You should see your commit with the message you entered.

  2. (Optional) Create a new repository on GitHub and push your local repository to GitHub. This will allow you to access your code from anywhere and collaborate with others.

Exercise 4: Descriptive statistics and visualisation

  1. Load the palmerpenguins dataset from the palmerpenguins package and take a look at the data.
NoteSolution
# install.packages("palmerpenguins")
library(palmerpenguins)
head(penguins)
# A tibble: 6 × 8
  species island    bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
  <fct>   <fct>              <dbl>         <dbl>             <int>       <int>
1 Adelie  Torgersen           39.1          18.7               181        3750
2 Adelie  Torgersen           39.5          17.4               186        3800
3 Adelie  Torgersen           40.3          18                 195        3250
4 Adelie  Torgersen           NA            NA                  NA          NA
5 Adelie  Torgersen           36.7          19.3               193        3450
6 Adelie  Torgersen           39.3          20.6               190        3650
# ℹ 2 more variables: sex <fct>, year <int>
  1. Filter the dataset to include only the Adelie species and create a scatter plot of bill length vs. bill depth using ggplot2. The result should look like this:

NoteSolution
library(tidyverse)
penguins %>%
  filter(species == "Adelie") %>%
  ggplot(aes(x = bill_length_mm, y = bill_depth_mm)) +
  geom_point() +
  labs(title = "Bill Length vs. Bill Depth for Adelie Penguins",
       x = "Bill Length (mm)",
       y = "Bill Depth (mm)")

  1. Calculate some descriptive statistics for the variable bill_length_mm (e.g. mean, median, standard deviation) using dplyr functions like summarise(). You can group the data by species or island to see how the statistics differ between groups.
NoteSolution
penguins %>%
  group_by(species) %>%
  summarise(mean_bill_length = mean(bill_length_mm, na.rm = TRUE), 
            median_bill_length = median(bill_length_mm, na.rm = TRUE),
            sd_bill_length = sd(bill_length_mm, na.rm = TRUE))
# A tibble: 3 × 4
  species   mean_bill_length median_bill_length sd_bill_length
  <fct>                <dbl>              <dbl>          <dbl>
1 Adelie                38.8               38.8           2.66
2 Chinstrap             48.8               49.6           3.34
3 Gentoo                47.5               47.3           3.08
  1. Create a boxplot of for each numeric variable (bill_length_mm, bill_depth_mm, flipper_length_mm, body_mass_g) by species to visualize the distributions for each species. How is each species characterised by these features?14
NoteSolution
penguins %>%
  pivot_longer(cols = c(bill_length_mm, bill_depth_mm, flipper_length_mm, body_mass_g), 
               names_to = "feature",
               values_to = "value") %>%
  ggplot(aes(x = species, y = value)) +
  geom_boxplot() +
  facet_wrap(~ feature, scales = "free_y") +
  labs(title = "Boxplots of Penguin Features by Species",
       x = "Species",
       y = "Value")

References

  • R4DS Chapter 1-10 - R for Data Science (2e) by Hadley Wickham, Mine Çetinkaya-Rundel, and Garrett Grolemund
  • Q4S Chapter 4 - Quarto for Scientists by Nicholas J. Tierney

Footnotes

  1. Change Directory↩︎

  2. In Windows, the Change Directory command without arguments prints the current directory↩︎

  3. Print Working Directory↩︎

  4. Directory↩︎

  5. List↩︎

  6. Make Directory↩︎

  7. Remove Directory↩︎

  8. Delete↩︎

  9. Remove↩︎

  10. This simulates a common workflow where you have different folders for different types of files (e.g. data, images, scripts, etc.) and need to update paths when you move files around.↩︎

  11. In some cases, this will not work, because the image is now outside of your project folder and cannot be accessed using a relative path. This highlights the importance of keeping all your project files within a single folder structure and using relative paths to access them.↩︎

  12. Robust means your code will run on any machine regardless of operating system and environment↩︎

  13. The README file is a common file in git repositories that provides an overview of the project and its contents.↩︎

  14. You can either create multiple plots for this task or try to visualise everything in one plot using faceting in ggplot2.↩︎

Source Code
---
title: "Practical #1"
subtitle: "Advanced Statistical Programming using R - Introduction"
author: "Leonhard Kestel, Lisa Bondo Andersen, Cynthia Huang"
date: "April 17, 2026"

format: 
  html:
    toc: true
    toc-depth: 1
    code-tools: true
    highlight-style: github

execute: 
  eval: false
  message: false
  warning: false
---

# Quiz
Before starting, work through this [QUIZ](quiz.qmd){target="_blank"} to check your understanding of the concepts covered in the lecture, as well as preliminary knowledge about R.

# General Remarks
In the first practical session, we will set up all the tools that will be used in this course and practice some basic skills and recap some of the skills from StatProg 1. The learning objectives for this session are:

1. Set up all the tools that will be used in this course:
   - R, Rstudio, tidyverse installation
   - git, GitHub
   - Quarto (via R, and command line)
   - Command line
  <!-- - AI horizon -->
2. Basic usage of these tools, and how they fit into a data science workflow
   - Rendering quarto documents
   - Basic command line operations
   - Work with Git repositories
   - Work with (relative) paths
3. Recap of skills from StatProg 1
   - Data wrangling (`dplyr`)
   - Data visualisation (`ggplot2`)


# Exercise 0: Set up your tools
If you successfully finished the "Introduction to Statistical Software for the Statistics Minor" class, you should already have all the tools installed. If not, please follow the instructions below to set up your tools for this course.

1. [Install R and Rstudio](https://posit.co/download/rstudio-desktop/)
2. Install the tidyverse packages in R:
```r
install.packages("tidyverse")
```

3. [Install Git](https://git-scm.com/install)
4. [Set up your GitHub account](https://github.com)
5. [Install Quarto](https://quarto.org/docs/get-started/)

# Exercise 1: Command line basics

1. Open a command line. You can either do this via Rstudio (Terminal tab) or via your operating system's terminal application (e.g. Terminal on Mac, Command Prompt or PowerShell on Windows).

If you are new to the command line, here are some basic commands to get you started:

| Description | Windows | Mac/Linux | Example |
|-------------|---------|-----------|---------|
| Change directory | `cd <target-directory>`^[*Change Directory*] | `cd <target-directory>` | `cd ~/Downloads` opens the Downloads folder |
| Display current directory | `cd`^[In Windows, the *Change Directory* command without arguments prints the current directory] | `pwd`^[*Print Working Directory*] | `pwd` displays your current location in the file system |
| List files in directory | `dir <target-directory>`^[*Directory*] | `ls <target-directory>`^[*List*] | `ls ~/Downloads` lists files in the current directory |
| Create a new directory | `mkdir <directory-name>`^[*Make Directory*] | `mkdir <directory-name>` | `mkdir my-new-folder` creates a new folder called "my-new-folder" in the current directory |
| Remove a directory | `rmdir <directory-name>`^[*Remove Directory*] | `rmdir <directory-name>` | `rmdir my-new-folder` removes the folder "my-new-folder" (only if it is empty) |
| Create a new file | `type nul > <file-name>` | `touch <file-name>` | `touch my-file.txt` creates a new file called "my-file.txt" in the current directory |
| Display file contents | `type <file-name>` | `cat <file-name>` | `cat my-file.txt` displays the contents of the file "my-file.txt" |
| Remove a file | `del <file-name>`^[*Delete*] | `rm <file-name>`^[*Remove*] | `rm my-file.txt` removes the file "my-file.txt" |

::: {.callout-tip}
## Getting help with CLI commands
On Mac/Linux, `man <command>` opens the manual page for a command (e.g. `man ls`). Press `q` to exit.
Most commands also accept a `--help` flag (e.g. `ls --help`) for a shorter summary.
On Windows, use `<command> /?` (e.g. `dir /?`).
:::

2. Using the command line, navigate to your home directory, create a new folder (directory) for this course and navigate into it.

3. Using the command line, create a new file called `hello-world.qmd` in that folder

4. Display the contents of that file in the command line (it should be empty). Add some content to the file (e.g. "Hello, world!") using RStudio. Display the contents of the file again in the command line to confirm that your changes were saved.
   
5. Render the quarto file using the `quarto render` command. The command creates an HTML file, which you can display in your browser.

6. You can also render the document using the "Render" button in Rstudio.

# Exercise 2: Relative paths
For a deep dive into relative paths and project handling in R, see [R4DS Chapter 6](https://r4ds.hadley.nz/workflow-scripts.html#projects) and [Q4S Chapter 4](https://qmd4sci.njtierney.com/workflow.html).

1. Save this image in your course folder and load it in your quarto document using a relative path
![alt text](./images/whole-game.png)

- In Markdown images are included like this:
```markdown
![alt text](./<image-name>.png)
```
- In R you can include images using `knitr::include_graphics()` like this:
```r
knitr::include_graphics("./<image-name>.png")
```

2. Move the image to a new folder (e.g. create a new folder called "images" and move the image there) and update the path in your quarto document accordingly.^[This simulates a common workflow where you have different folders for different types of files (e.g. data, images, scripts, etc.) and need to update paths when you move files around.]

3. Move the image outside of your course folder (e.g. to your desktop) and try to load it again using a relative path.^[In some cases, this will not work, because the image is now outside of your project folder and cannot be accessed using a relative path. This highlights the importance of keeping all your project files within a single folder structure and using relative paths to access them.]

4. (Optional) Install the `here` package (see [Q4S Chapter 4.7](https://qmd4sci.njtierney.com/workflow.html#the-here-package)) and use it to load the image in your quarto document. The `here` package provides a simple way to construct file paths relative to the root of your project, which can make your code more robust^[Robust means your code will run on any machine regardless of operating system and environment] and easier to read.
   

# Exercise 3: git repositories
Now, we will set up a git repository for this course and practice some basic git commands. There is a very handy cheat sheet for git commands available [here](https://git-scm.com/cheat-sheet). Take a look at the first three sections of the cheat sheet (*Getting Started*, *Preparing to Commit*, and *Making Commits*).

1. Initialize a new git repository in your course folder using the command line `git init`.

2. Create a new file called `README.md` in your course folder and add some content to it (e.g. "This is my course repository for StatProg 2"). Save the file.^[The README file is a common file in git repositories that provides an overview of the project and its contents.]

::: {.callout-note title="Solution" collapse="true"}
You can create the file and add content using RStudio, or you can do it directly in the command line using the `echo` command:
```{bash}
echo "This is my course repository for StatProg 2" > README.md
```
:::

1. Use the `git status` command to see the status of your repository. You should see that the `README.md` file is untracked (i.e. it is not yet being tracked by git).

2. Use the `git add` command to stage the `README.md` file for commit. Then use the `git status` command again to see that the file is now staged.

::: {.callout-note title="Solution" collapse="true"}
```{bash}
git add README.md
git status
```
:::

5. Use the `git commit` command to commit the staged file. You will be prompted to enter a commit message.

::: {.callout-note title="Solution" collapse="true"}
You can also provide the commit message directly in the command line using the `-m` flag:
```{bash}
git commit -m "Add README file"
```
:::

6. Use the `git log` command to see the commit history of your repository. You should see your commit with the message you entered.

7. (Optional) Create a new repository on GitHub and push your local repository to GitHub. This will allow you to access your code from anywhere and collaborate with others.


# Exercise 4: Descriptive statistics and visualisation
1. Load the `palmerpenguins` dataset from the `palmerpenguins` package and take a look at the data.

::: {.callout-note title="Solution" collapse="true"}
```{r echo=TRUE, eval=TRUE}
# install.packages("palmerpenguins")
library(palmerpenguins)
head(penguins)
```
:::

2. Filter the dataset to include only the `Adelie` species and create a scatter plot of bill length vs. bill depth using `ggplot2`. The result should look like this:
```{r echo=FALSE, eval=TRUE}
library(palmerpenguins)
library(tidyverse)
penguins %>%
  filter(species == "Adelie") %>%
  ggplot(aes(x = bill_length_mm, y = bill_depth_mm)) +
  geom_point() +
  labs(title = "Bill Length vs. Bill Depth for Adelie Penguins",
       x = "Bill Length (mm)",
       y = "Bill Depth (mm)")
```

::: {.callout-note title="Solution" collapse="true"}
```{r echo=TRUE, eval=TRUE}
library(tidyverse)
penguins %>%
  filter(species == "Adelie") %>%
  ggplot(aes(x = bill_length_mm, y = bill_depth_mm)) +
  geom_point() +
  labs(title = "Bill Length vs. Bill Depth for Adelie Penguins",
       x = "Bill Length (mm)",
       y = "Bill Depth (mm)")
```
:::

3. Calculate some descriptive statistics for the variable `bill_length_mm` (e.g. mean, median, standard deviation) using `dplyr` functions like `summarise()`. You can group the data by species or island to see how the statistics differ between groups.

::: {.callout-note title="Solution" collapse="true"}
```{r echo=TRUE, eval=TRUE}
penguins %>%
  group_by(species) %>%
  summarise(mean_bill_length = mean(bill_length_mm, na.rm = TRUE), 
            median_bill_length = median(bill_length_mm, na.rm = TRUE),
            sd_bill_length = sd(bill_length_mm, na.rm = TRUE))
```
:::

4. Create a boxplot of for each numeric variable (`bill_length_mm`, `bill_depth_mm`, `flipper_length_mm`, `body_mass_g`) by species to visualize the distributions for each species. How is each species characterised by these features?^[You can either create multiple plots for this task or try to visualise everything in one plot using *faceting* in ggplot2.]

::: {.callout-note title="Solution" collapse="true"}
```{r echo=TRUE, eval=TRUE}
penguins %>%
  pivot_longer(cols = c(bill_length_mm, bill_depth_mm, flipper_length_mm, body_mass_g), 
               names_to = "feature",
               values_to = "value") %>%
  ggplot(aes(x = species, y = value)) +
  geom_boxplot() +
  facet_wrap(~ feature, scales = "free_y") +
  labs(title = "Boxplots of Penguin Features by Species",
       x = "Species",
       y = "Value")
```
:::


# References
* [R4DS Chapter 1-10](https://r4ds.hadley.nz/) - R for Data Science (2e) by Hadley Wickham, Mine Çetinkaya-Rundel, and Garrett Grolemund
* [Q4S Chapter 4](https://qmd4sci.njtierney.com) - Quarto for Scientists by Nicholas J. Tierney