Week 1: Introductions
2027-04-04

Dr. Cynthia Huang
Post-Doctoral Fellow
Research: Data quality, data visualisation, trustworthy data science

Leonhard Kestel
PhD student, SODA
Research: Neurosymbolic AI, algorithmic fairness

Lisa Bondo Andersen
PhD student, SODA
Research: Survey methodology, mouse movements
Turn to the person next to you and find out their:
You will introduce them to the group.
Solo: Individual reflection log
Collaborative: Group data analysis
Important
In an age of AI, critical thinking skills such as statistical interpretation and LLM evaluation are becoming more and more important relative to engineering skills for implementing statistical code.
Oral exam
More details to be provided closer to the examination date.
We assume you know the following from StatProg1, Stat1 & Stat2:
Useful sources for revising these topics:
In this course we will use:
gitAnd numerous R packages including tidyverse, and palmerpenguins.
You will cover all these in this week’s practical.
What is statistical programming, and how does it relate to statistics and data science?
Four major influences act on data analysis today:
- The formal theories of statistics
- Accelerating developments in computers and display devices
- The challenge, in many fields, of more and ever larger bodies of data
- The emphasis on quantification in an ever wider variety of disciplines
Three broad categories characterize work in greater statistics:
- preparing data, including planning, collection, organization, and validation
- analysing data, by models or other summaries
- presenting data in written, graphical or other form.
– Chambers (1993)
R is a dialect of S1:
R, S & tidy tools2 philosophy:
This course aims to support you in learning from data using statistical reasoning AND computational tools.
Statistical
Programming
In the hands-on project, you will:
More details can be found on the course website. We recommend thinking about your dataset earlier! The earlier you start looking, the more likely you will find a dataset you will enjoy working on!
Your dataset:
nycflights13)At least one data table should contain:
y (not binary 0/1),x_1 (can be time)x_2
In coming lectures we will cover:
These slides are adapted from https://rcp.numbat.space/week3/#/etc5513-title
Version control systems are a category of software tools that help store and manage changes to source code (projects) over time. They can:
It is a very useful (actually essential!) tool for collaborating and for sharing open source resources.

Why?
Typically when you open your terminal, it will welcome you with a prompt that looks like this:
cynthia@computerid-macbook:~$
or with the new Catalina Mac OX
cynthia@computerid ~ %
On Windows it will contain the same elements but look like this:
cynthia@computerid-pc MINGW64 ~$
We will start writing commands after ~$ or ~% depending on the terminal version that you are using
The commands that we are going to use are the same regardless the terminal version you have.
pwd: print working directory or present working directory
cynthiah@computerid~ % pwd
/Users/cynthiah/Documents/Courses/StatProg2
pwd command:/Users/cynthiah/Documents/Courses/StatProg2
/ represents the root directoryUsers is the Users directorycynthiah refers to my directory or folder within the users directoryls lists the files inside the current directory
cynthiah@computerid Documents~ % ls Documents
Courses Research Teaching file.pdf example.txt
Documents is an argument to the ls command.ls gives you a list of all the elements in a directoryls -a list of all the files including hidden onesEach Linux command (pwd,ls …) have lots of options (flags) that can be added.
A reference list of unix commands with options might be found here
We will cover the workings of git in a future class. For now you will practice in the practical:
git init to initialise a folder as Git repository (i.e. start tracking version history in this folder.)git add filename is a Git command that adds a change in the working directory to the staging area. (prepare to add this to the version history)git commit -m "Message": The Git commit command captures a snapshot of the project’s currently staged changes. (m = message for commit. The git commit is used to create a snapshot of the staged changes along a timeline of a Git projects history.) cd: Change directory
pwd).cd command syntax is very simple, we just need to specify the directory that we want to navigate topwd command to confirm your current location/ is assumed to be absolute.cd in practice!Documents. I want to get to Documents/Research/DataViscd Research means that we move into Researchcd DataVis means that we move into DataVis. means the current directory DataVis~ symbol is a shorthand for the user’s home directory and we can use it to form paths:
Downloads directory (/Users/John/Downloads) typing cd ~ will bring you to your Home directory /Users/John!DataVis, or really Documents/Research/DataVis.. is shorthand for the parent of the current working directorycd .. means that we move into Research (1 directory up). That is from DataVis back to Researchcd ../../ means that we move up two directories: from DataVis to Documentsmkdir Project1 Project2 means “make two new directories (folders) called Project1 and Project2”.mv move files or folders: takes two arguments, the first being files or folders to move and the second being the path to move to.cp this command is used to copy files or group of files or directories. When copy files we need to use cp -r to copy all the directory contents.rm remove files and foldersrm requires the -r (recursive) flagtouch example.qmdExcellent summary about the commands that we will be using can be found here.