Quick Overview

Column 1

Outline

R is a popular platform for manipulation, visualization and analysis of data and has a number of advantages over other statistical software packages. A wide community of users (including methodologists within official statistics) contribute to R, resulting in an enormous coverage of statistical procedures, including many that are not available in any other statistical program. Furthermore, it is highly flexible for programming and scripting purposes, for example when manipulating data or creating professional plots.

However, R lacks standard GUI menus, as in SPSS for example, from which to choose what statistical test to perform or which graph to create. As a consequence, R is initially more challenging to master. This course offers an introduction to statistical programming in R, that you can built upon after the course.

The course starts at a very basic level and builds up gradually. No previous experience with R is required. It is very important to realize, that you will not become a master R programmer in five days, but you will get the tools to start your journey. As with every other endavor, training is the key. Luckily, experience tells us that you will relatively quick become proficient with R.

We hope we will have some productive days together :-)

Column 2

Course schedule

Time Topic
Monday
14.00-17.00 Datatypes in R and basic syntax
Tuesday
09.00-10.45 Introduction to survey sampling with R
11.15-13.00 Datatypes (lecture continued)
14.00-16.00 Datatypes (practicals)
Wednesday
09.00-10.45 Packages and reading external data
11.15-13.00 Data manipulation (pipes and dplyr)
14.00-16.00 Data manipulation (practicals)
Thursday
09.00-10.45 Data manipulation (continued)
11.15-13.00 Demo: Making maps
14.00-16.00 Summary statistics
Friday
09.00-10.15 Calibration for non-response
10.45-12.00 Wrapping up

How to prepare

Column 1

Preparing your machine for the course

The below steps guide you through installing both R as well as the necessary additions.

System requirements

Bring a computer to the course and make sure that you have full write access and administrator rights to the machine. We will explore programming and compiling in this course. This means that you need full access to your machine. Some corporate laptops come with limited access for their users, we therefore advice you to bring a personal laptop computer, if you have one.

1. Install R

R can be obtained here. We won’t use R directly in the course, but rather call R through RStudio. Therefore it needs to be installed. If you are on a Windows machine, choose “Download R for Windows” and then “Binaries for base distribution. This is what you want to install R for the first time”. Follow the instructions and accept default settings.

2. Install RStudio Desktop

Rstudio is an Integrated Development Environment (IDE). It can be obtained as stand-alone software here. The free and open source RStudio Desktop version is sufficient. Again, follow the instructions and accept default settings.

3. Start RStudio and install the following packages.

For this course we will need a number of packages. Execute the following lines of code in the console window:

install.packages(c("tidyverse", "knitr", "rmarkdown", "haven",
                   "DBI", "devtools", "mice", "lubridate", "openxlsx",
                   "RSQLite"), 
                 dependencies = TRUE)

If you are not sure where to execute code, use the following figure to identify the console:

HTML5 Icon

Just copy and paste the installation command and press the return key. When asked

Do you want to install from sources the package which needs 
compilation? (Yes/no/cancel)

type Yes in the console and press the return key. This might seem tricky first time, but installing packages will soon become second nature.

Monday

Materials for Monday

We adapt the course as we go. To ensure that you work with the latest iteration of the course materials, I advice all course participants to access the materials online at peterstoltze.github.io/R_Accra_2024.

Lecture presentation and exercises will be made available here. Download the files and put them in their own directory.

Lectures and practicals are in html format and open in a new browser pane. For some practicals there are supplementary files, that you should download and store in a folder on your own computer. Sometimes the supplementary material includes a walkthrough (solution) to the exercise, but try to solve the practical by yourself first.

Tuesday

Materials for Tuesday

We adapt the course as we go. To ensure that you work with the latest iteration of the course materials, I advice all course participants to access the materials online at peterstoltze.github.io/R_Accra_2024.

Lecture presentation and exercises will be made available here. Download the files and put them in their own directory.

  • Part C: Introduction to sampling with R

Wednesday

Materials for Wednesday

We adapt the course as we go. To ensure that you work with the latest iteration of the course materials, I advice all course participants to access the materials online at peterstoltze.github.io/R_Accra_2024.

Lecture presentation and exercises will be made available here. Download the files and put them in their own directory.

Lectures and practicals are in html format and open in a new browser pane.

Thursday

Materials for Thursday

We adapt the course as we go. To ensure that you work with the latest iteration of the course materials, I advice all course participants to access the materials online at peterstoltze.github.io/R_Accra_2024.

Lecture presentation and exercises will be made available here. Download the files and put them in their own directory.

Friday

Materials for Friday

We adapt the course as we go. To ensure that you work with the latest iteration of the course materials, I advice all course participants to access the materials online at peterstoltze.github.io/R_Accra_2024.

Lecture presentation and exercises will be made available here. Download the files and put them in their own directory.

Further studies

Column 1

What to do after the course

The following references are currently available for free, either as pdfs or as extensive webpages (written with RMarkdown and bookdown). They are all very useful and we highly recommend them.

  • R for Data Science: written by Hadley Wickham and Garrett Grolemund this book relies almost exclusively on the tidyverse approach to data analysis. Many highly effective tools will be right at your fingertips after reading this book and working through the many exercises.
  • Advanced R: You want to gain deeper knowledge of R and you wnat to learn from one of the most influential R contributors. This one is for you!
  • Rmarkdown: A very thorough look at RMarkdown. No need to read everything, but if there is a subject you struggle with, take a look at the index.
  • ggplot2 - The book: The ultimate guide to ggplot written by Hadley Wickham who is also the author to the package. It is quite long, but if you want to truely understand it, this is the place to start.

Happy Git and GitHub for the useR is a great introduction to version control using Git and GitHub together with RStudio. Written by Jenny Bryan in a very concise style. Version control is highly recommended as the backbone of a reproducible workflow!

I also recommend looking at Awesome official statistics software, which guides you to great resource for official statistics organised by GSBPM.

Finally we have to mention ChatGPT which even in the free GPT-3.5 version is a quite capable R programmer! You can use ChatGPT to explain what a piece of R code is doing, or you can use ChatGPT to suggest code for a certain task. But as in every other setting, the more precise you are able to phrase the question, the better the answer will generally be. Of course, you need to validate the output from ChatGPT yourself, and of course no sensitive information should be handed over to ChatGPT.