R is a popular platform for manipulation, visualization and analysis of data and has a number of advantages over other statistical software packages. A wide community of users (including methodologists within official statistics) contribute to R, resulting in an enormous coverage of statistical procedures, including many that are not available in any other statistical program. Furthermore, it is highly flexible for programming and scripting purposes, for example when manipulating data or creating professional plots.
However, R lacks standard GUI menus, as in SPSS for example, from which to choose what statistical test to perform or which graph to create. As a consequence, R is initially more challenging to master. This course offers an introduction to statistical programming in R, that you can built upon after the course.
The course starts at a very basic level and builds up gradually. No previous experience with R is required. It is very important to realize, that you will not become a master R programmer in five days, but you will get the tools to start your journey. As with every other endavor, training is the key. Luckily, experience tells us that you will relatively quick become proficient with R.
We hope we will have some productive days together :-)
Time | Topic |
---|---|
Monday | |
14.00-17.00 | Datatypes in R and basic syntax |
Tuesday | |
09.00-10.45 | Introduction to survey sampling with R |
11.15-13.00 | Datatypes (lecture continued) |
14.00-16.00 | Datatypes (practicals) |
Wednesday | |
09.00-10.45 | Packages and reading external data |
11.15-13.00 | Data manipulation (pipes and dplyr) |
14.00-16.00 | Data manipulation (practicals) |
Thursday | |
09.00-10.45 | Data manipulation (continued) |
11.15-13.00 | Demo: Making maps |
14.00-16.00 | Summary statistics |
Friday | |
09.00-10.15 | Calibration for non-response |
10.45-12.00 | Wrapping up |
The below steps guide you through installing both R
as
well as the necessary additions.
Bring a computer to the course and make sure that you have full write access and administrator rights to the machine. We will explore programming and compiling in this course. This means that you need full access to your machine. Some corporate laptops come with limited access for their users, we therefore advice you to bring a personal laptop computer, if you have one.
R
R
can be obtained here. We won’t use R
directly in the course, but rather call R
through
RStudio
. Therefore it needs to be installed. If you are on
a Windows machine, choose “Download R for Windows” and then
“Binaries for base distribution. This is what you want to install R
for the first time”. Follow the instructions and accept default
settings.
RStudio
DesktopRstudio is an Integrated Development Environment (IDE). It can be
obtained as stand-alone software here. The free and open
source RStudio Desktop
version is sufficient. Again, follow
the instructions and accept default settings.
For this course we will need a number of packages. Execute the following lines of code in the console window:
install.packages(c("tidyverse", "knitr", "rmarkdown", "haven",
"DBI", "devtools", "mice", "lubridate", "openxlsx",
"RSQLite"),
dependencies = TRUE)
If you are not sure where to execute code, use the following figure to identify the console:
Just copy and paste the installation command and press the return key. When asked
type Yes
in the console and press the return key. This
might seem tricky first time, but installing packages will soon become
second nature.
We adapt the course as we go. To ensure that you work with the latest iteration of the course materials, I advice all course participants to access the materials online at peterstoltze.github.io/R_Accra_2024.
Lecture presentation and exercises will be made available here. Download the files and put them in their own directory.
Lectures and practicals are in html
format and open in a
new browser pane. For some practicals there are supplementary files,
that you should download and store in a folder on your own computer.
Sometimes the supplementary material includes a walkthrough (solution)
to the exercise, but try to solve the practical by yourself first.
We adapt the course as we go. To ensure that you work with the latest iteration of the course materials, I advice all course participants to access the materials online at peterstoltze.github.io/R_Accra_2024.
Lecture presentation and exercises will be made available here. Download the files and put them in their own directory.
We adapt the course as we go. To ensure that you work with the latest iteration of the course materials, I advice all course participants to access the materials online at peterstoltze.github.io/R_Accra_2024.
Lecture presentation and exercises will be made available here. Download the files and put them in their own directory.
Lectures and practicals are in html
format and open in a
new browser pane.
We adapt the course as we go. To ensure that you work with the latest iteration of the course materials, I advice all course participants to access the materials online at peterstoltze.github.io/R_Accra_2024.
Lecture presentation and exercises will be made available here. Download the files and put them in their own directory.
We adapt the course as we go. To ensure that you work with the latest iteration of the course materials, I advice all course participants to access the materials online at peterstoltze.github.io/R_Accra_2024.
Lecture presentation and exercises will be made available here. Download the files and put them in their own directory.
The following references are currently available for free, either as pdfs or as extensive webpages (written with RMarkdown and bookdown). They are all very useful and we highly recommend them.
ggplot
written by Hadley Wickham who is
also the author to the package. It is quite long, but if you want to
truely understand it, this is the place to start.Happy Git and GitHub for the useR is a great introduction to version control using Git and GitHub together with RStudio. Written by Jenny Bryan in a very concise style. Version control is highly recommended as the backbone of a reproducible workflow!
I also recommend looking at Awesome official statistics software, which guides you to great resource for official statistics organised by GSBPM.
Finally we have to mention ChatGPT which even in the free GPT-3.5 version is a quite capable R programmer! You can use ChatGPT to explain what a piece of R code is doing, or you can use ChatGPT to suggest code for a certain task. But as in every other setting, the more precise you are able to phrase the question, the better the answer will generally be. Of course, you need to validate the output from ChatGPT yourself, and of course no sensitive information should be handed over to ChatGPT.