When working with new data, it is a good idea to get an overview of the dataset
glimpse(flights)
## Rows: 336,776
## Columns: 19
## $ year <int> 2013, 2013, 2013, 2013, 2013, 2013,…
## $ month <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ day <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ dep_time <int> 517, 533, 542, 544, 554, 554, 555, …
## $ sched_dep_time <int> 515, 529, 540, 545, 600, 558, 600, …
## $ dep_delay <dbl> 2, 4, 2, -1, -6, -4, -5, -3, -3, -2…
## $ arr_time <int> 830, 850, 923, 1004, 812, 740, 913,…
## $ sched_arr_time <int> 819, 830, 850, 1022, 837, 728, 854,…
## $ arr_delay <dbl> 11, 20, 33, -18, -25, 12, 19, -14, …
## $ carrier <chr> "UA", "UA", "AA", "B6", "DL", "UA",…
## $ flight <int> 1545, 1714, 1141, 725, 461, 1696, 5…
## $ tailnum <chr> "N14228", "N24211", "N619AA", "N804…
## $ origin <chr> "EWR", "LGA", "JFK", "JFK", "LGA", …
## $ dest <chr> "IAH", "IAH", "MIA", "BQN", "ATL", …
## $ air_time <dbl> 227, 227, 160, 183, 116, 150, 158, …
## $ distance <dbl> 1400, 1416, 1089, 1576, 762, 719, 1…
## $ hour <dbl> 5, 5, 5, 5, 6, 5, 6, 6, 6, 6, 6, 6,…
## $ minute <dbl> 15, 29, 40, 45, 0, 58, 0, 0, 0, 0, …
## $ time_hour <dttm> 2013-01-01 05:00:00, 2013-01-01 05…