In this exercise we will learn to use packages and write data to/from external sources
RSQLite
using the
console and skim the console-output.To install packages we use the function
install.packages
install.packages("RSQLite")
There is a lot of text in the output, including the website where the package is fetched from and the path it is installed to. If the package depended on other packages they would also be shown here. If the package is already installed and up-to-date it will not be installed again, but a short information will be shown here.
dplyr
and read the output in
the console. What do you think it means?We load a packages with the function library
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
It informs us that dplyr is masking the functions filter
and lag
from the package stats
. This is
because there is a naming conflict. Both dplyr
and
stats
has functions by those names, so when you write
filter
in the console R
does not know which
one to choose. To avoid this the latest package you loaded masks the
others, so in this case when you write filter
it will use
the function from the dplyr
package.
To be helpful library
prints this informational message to
avoid confusion.
DBI
. If you
do not have it installed, do so. This might come in handy
laterbrowseVignettes("DBI")
TRUE
if you have
used R
before and FALSE
if you have not.
Assign the dataframe to a name you choose yourselvesTo create the dataframe use the function data.frame
.
Speak with your colleagues to get values for your dataframe.
personal_data <- data.frame(names = c("Morten", "Philip", "Karen", "Emil"),
shoe_size = c(45, 46, 41, 42),
used_r = c(TRUE, TRUE, TRUE, TRUE))
Here it is possible to make problems with the names. Spaces should be avoided and it is a convention to use snakecase (words seperated by _).
Use the function write.xlsx
from the package
openxlsx
.
library(openxlsx)
## Warning: package 'openxlsx' was built under R version 4.4.2
openxlsx::write.xlsx(personal_data,
"files/personal_file.xlsx")
Hopefully you have excel installed, else this exercise will is
difficult. But you can read the data into R using the function
read.xlsx
.
read.xlsx("files/personal_file.xlsx")
## names shoe_size used_r
## 1 Morten 45 TRUE
## 2 Philip 46 TRUE
## 3 Karen 41 TRUE
## 4 Emil 42 TRUE
As you can see the logical column has switched from logical to character. This is a classical problem when switching between formats. In this case it is because Excel doesn’t recognise R’s logical datatype. It is always a good idea to check your data when reading from external sources.
First you should find the path to the shared drive, we will determine
this name together before we start the exercises. This might not be
possible if you are working on your laptops. So work together with your
colleagues who got a desktop. For this example we will assume the path
is "H:/"
Once you have the path you need a unique name for your database.
Could for instance be your name and two numbers:
"peter23.sqlite"
.
The database will automatically be created when we try to connect to
it. Use the function dbConnect
to from the package
DBI
to open the connection:
library(DBI)
con <- dbConnect(drv = RSQLite::SQLite(),
dbname = "H:/peter23.sqlite")
To write our dataset to the database we use the function
dbWriteTable
. Check the help page for the function to see
the arguments.
dbWriteTable(conn = con,
name = "my_tablename",
value = personal_data)
dbWriteTable
just returns a logical value
TRUE
if it succeeded. But to make sure that the table with
our data has been created, use the function
dbListTables
dbListTables(con)
## [1] "my_tablename"
If the name of your table does not show up, something went wrong.
To read the full dataset we use the function
dbReadTable
. It takes two required arguments: A connection
to a database and the name of the table you want to read:
personal_data_from_db <- dbReadTable(conn = con,
name = "my_tablename")
Inspect the data either by printing it in the console or use the
View
-function. You will see that once again, the logical
column has changed datatype. This time to a numeric/integer.
This is done using the function dbDisconnect
which takes
your connection as an argument.
dbDisconnect(con)
If the other group named their database
"H:/anna93.sqlite"
then you can connect to it just like if
it was your own, so you can reuse your code from exercise 7:
con2 <- dbConnect(RSQLite::SQLite(),
dbname = "H:/anna93.sqlite")
dbListTables(con2)
## [1] "their_table_name"
dbWriteTable
to see how it is done. Close the connection
when you are done.The first thing you have to do is make your dataset look like theirs. They may have used different names for their columns, so start by reading their table and inspecting it using the tablename you found in exercise 11.
their_data <- dbReadTable(con2, "their_table_name")
View(their_data)
Now rename your dataframe so it matches theirs. This can be done
using the names
function
names(personal_data) <- c("Their", "column", "names") ##Remember R is casesensitive!
Now you should be able to add your dataset to their table:
dbWriteTable(conn = con2,
name = "their_table_name",
value = personal_data,
append = TRUE)
Disconnect from their database
dbDisconnect(con2)
Use the same code you made in exercise 7 and 9. If the other group successfully appended their dataset, you should see more rows than you did earlier.
End of practical