We'll be joining tables from the nycflights13 data set
library("nycflights13")
## Warning: package 'nycflights13' was built under R version 4.4.2
Select time_hour, flight, carrier and tailnum from
flights
and join with the planes
table. Check
that the new table has the same amount of rows as the
flights
table.
You'll need to use the functions select()
,
left_join()
and nrow()
data_flights_planes <- flights |>
select(time_hour, flight, carrier, tailnum) |>
left_join(planes, by = "tailnum")
nrow(data_flights_planes) == nrow(flights)
Choose planes with 1, 3 or 4 engines from planes
and use
semi_join()
to filter so that only flights with these
planes are left.
Choose the desired rows first by using filter()
and
%in%
planes_for_join <- planes |>
filter(engines %in% c(1, 3:4))
data_1_3_4_eng <- flights |>
semi_join(planes_for_join, by = "tailnum")
Some of the planes have a missing value in the year
variable. Remove flights with these planes by using
anti_join()
. Check that some flights have been removed.
Choose the planes to exclude first by using filter()
and
is.na()
.
planes_to_remove <- planes |>
filter(is.na(year))
data_planes_year <- flights |>
anti_join(planes_to_remove, by = "tailnum")
nrow(flights) > nrow(data_planes_year)
Find out if any flights in the data have been flying at a higher average speed than that plane's listed speed.
You'll need to create multiple new variables using 'air_time' (which
is in minutes) and 'distance' to be able to compare the miles per hour
of the flight to that plane's speed
.
data_flights_speed <- flights |>
left_join(planes, by = "tailnum") |>
mutate(air_time_hours = air_time / 60,
mph = distance / air_time_hours,
super_speed = mph > speed)
data_flights_speed |> count(super_speed)
Using the weather
data and the table you created in the
last exercise, find out the overall average wind speed for flights that
were faster than their listed plane speeds. Compare it to the average
wind speed for the slower flights.
Look under time_hour
in the documentation for the
flights
data set for an idea of how to join the
weather
data. Then use filter()
and
mean()
. The latter function needs a vector as its first
argument - you cannot pipe a table into it.
data_with_weather <- data_flights_speed |>
left_join(weather, by = c("origin", "time_hour"))
data_super_speed <- data_with_weather |> filter(super_speed == TRUE)
data_less_speed <- data_with_weather |> filter(super_speed == FALSE)
mean(data_super_speed$wind_speed, na.rm = TRUE)
mean(data_less_speed$wind_speed, na.rm = TRUE)