In this lecture we will look at:
- Packages
- Reading CSV, SAS and Excel-files
- Writing CSV and Excel
- Reading and writing to a database
- Creating functions
In this lecture we will look at:
base
mice
which can be used for Multivariate Imputation by Chained Equations (don’t worry about this, but is quite useful)install.packages("mice")
library(mice)
## Warning: package 'mice' was built under R version 4.4.2
boys[1:5,]
## age hgt wgt bmi hc gen phb tv reg ## 3 0.035 50.1 3.65 14.54 33.7 <NA> <NA> NA south ## 4 0.038 53.5 3.37 11.77 35.0 <NA> <NA> NA south ## 18 0.057 50.0 3.14 12.56 35.2 <NA> <NA> NA south ## 23 0.060 54.5 4.27 14.37 36.7 <NA> <NA> NA south ## 28 0.062 57.5 5.03 15.21 37.3 <NA> <NA> NA south
install.packages
install.packages("mice")
## Warning: package 'mice' is in use and will not be installed
library
.library(mice) boys[1:3,]
## age hgt wgt bmi hc gen phb tv reg ## 3 0.035 50.1 3.65 14.54 33.7 <NA> <NA> NA south ## 4 0.038 53.5 3.37 11.77 35.0 <NA> <NA> NA south ## 18 0.057 50.0 3.14 12.56 35.2 <NA> <NA> NA south
ls()
## character(0)
If for some reasons you want to add the boys data set to your environment, just do it like this:
data (boys)
Now it is there:
ls()
## [1] "boys"
?mice
help(package = "mice")
boys
-dataset without loading the boys.RData
boys[1:5,]
## age hgt wgt bmi hc gen phb tv reg ## 3 0.035 50.1 3.65 14.54 33.7 <NA> <NA> NA south ## 4 0.038 53.5 3.37 11.77 35.0 <NA> <NA> NA south ## 18 0.057 50.0 3.14 12.56 35.2 <NA> <NA> NA south ## 23 0.060 54.5 4.27 14.37 36.7 <NA> <NA> NA south ## 28 0.062 57.5 5.03 15.21 37.3 <NA> <NA> NA south
package::object
-notationmice::boys[1:3,]
## age hgt wgt bmi hc gen phb tv reg ## 3 0.035 50.1 3.65 14.54 33.7 <NA> <NA> NA south ## 4 0.038 53.5 3.37 11.77 35.0 <NA> <NA> NA south ## 18 0.057 50.0 3.14 12.56 35.2 <NA> <NA> NA south
browseVignettes("readr")
We will use these packages in this lecture
## For reading and writing files library(readr) library(haven) library(openxlsx)
## Warning: package 'openxlsx' was built under R version 4.4.2
## For working with databases library(DBI) library(RSQLite)
## Warning: package 'RSQLite' was built under R version 4.4.1
getwd
getwd()
## [1] "C:/Users/Morten"
file("example.txt")
file("C:/Users/Morten/example.txt")
As a rule of thumb:
boys.RData
.# CSV (using the package readr) readr::write_csv2(boys, "files/boys.csv") # Excel (using the package openxlsx) openxlsx::write.xlsx(boys, "files/boys.xlsx") # STATA (using the package haven) haven::write_dta(boys, "files/boys.dta")
# CSV boys_csv <- readr::read_csv2("files/boys.csv") # Excel boys_xlsx <- openxlsx::read.xlsx("files/boys.xlsx") # SAS boys_dta <- haven::read_dta("files/boys.dta")
write_csv2
/read_csv2
and not write_csv
/read_csv
.;
to separate values instead of ,
which can be problematic in our regionboys_csv[1:5,]
## # A tibble: 5 × 9 ## age hgt wgt bmi hc gen phb tv reg ## <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <chr> <dbl> <chr> ## 1 0.035 50.1 3.65 14.5 33.7 <NA> <NA> NA south ## 2 0.038 53.5 3.37 11.8 35 <NA> <NA> NA south ## 3 0.057 50 3.14 12.6 35.2 <NA> <NA> NA south ## 4 0.06 54.5 4.27 14.4 36.7 <NA> <NA> NA south ## 5 0.062 57.5 5.03 15.2 37.3 <NA> <NA> NA south
factor
and it is cast as character
ROracle
)RPostgresql
)RMysql
)RSQLite
)DBI
-package to interact with SQLite
.db
and then a verb s.a. dbConnect
SQLite
as an example since we can run it ourselves on our computers.dbConnect
:library(DBI) con <- dbConnect(drv = RSQLite::SQLite(), dbname = "files/boys_database.sqlite") con
## <SQLiteConnection> ## Path: C:\data\08. International Consulting\ ...\files\boys_database.sqlite ## Extensions: TRUE
SQLite
.dbname
could be any valid pathnamedbWriteTable
dbWriteTable(con, name = "boys_table", value = boys)
dbListTables(con)
## [1] "boys_table"
dbReadTable
boys_from_db <- dbReadTable(con, "boys_table") boys_from_db[1:5,]
## age hgt wgt bmi hc gen phb tv reg ## 1 0.035 50.1 3.65 14.54 33.7 <NA> <NA> NA south ## 2 0.038 53.5 3.37 11.77 35.0 <NA> <NA> NA south ## 3 0.057 50.0 3.14 12.56 35.2 <NA> <NA> NA south ## 4 0.060 54.5 4.27 14.37 36.7 <NA> <NA> NA south ## 5 0.062 57.5 5.03 15.21 37.3 <NA> <NA> NA south
dbGetQuery
old_boys <- dbGetQuery(con, "SELECT * FROM BOYS_TABLE WHERE age > 20") old_boys[1:5,]
## age hgt wgt bmi hc gen phb tv reg ## 1 20.010 170.0 68.8 23.80 55.5 G5 P6 25 east ## 2 20.030 178.6 71.0 22.25 57.2 G5 P5 25 city ## 3 20.032 184.0 73.0 21.56 56.0 <NA> <NA> NA north ## 4 20.117 188.7 89.4 25.10 58.1 G5 P6 25 east ## 5 20.281 185.1 81.1 23.67 58.8 G5 P6 20 south
dbRemoveTable
dbRemoveTable(con, "boys_table")
dbDisconnect(con)
SQlite
is often not a good candidate for a database. When working with official statistics you want a more featurecomplete database running on a dedicated server.dbConnect
is the same no matter what kind of database-backend you use.