gmapsdistance
How long does it take to get from point A to point B? We rarely travel in a straight line, so measuring this distance often isn’t helpful; we are likely to underetsimate the actual time and distance taken when using transport. We can use the Google Maps Application Programming Interface (API) – wrapped in an R package called gmapsdistance
– to query Google Maps with This is simpler than manually inputting addresses to Google Maps via the browser.
We can supply a sequence of origin and destination points and have all the results returned as an R list-class object for easy analysis. Distances (metres) and journey duration (seconds) are returned, having selected from three transport modes: on foot, by car, or by public transport. (Also by bicycle for parts of North America only.)
Information on the gmapsdistance
package:
Note that the local authorities and schools mentioned here were chosen arbitrarily and randomly; they have no particular significance.
Install the packages in the form install.packages("gmapsdistance")
and load with:
library(gmapsdistance) # for getting info from the API
library(dplyr) # for data manipulation and pipes (%>%)
We’ll read in secondary schools data from two local authorities (Cambridgeshire and Cumbria) and sample five from each. We need only details of the schoosl name and location and will be using the postcode as our origin and destination point information. The data are from the Get Information About Schools service.
I have previously saved a version of the GIAS dataset as an RDS file, which is what is read in below. I had prepared this file using the janitor::clean_names()
function to tidy the column names, hence the lower case and underscores.
set.seed(1337) # for reproducibility of our sample
# Read data from Get Information About Schools
# Randomly sample a couple of schools from a couple of local authorities
gias_raw <- readRDS("data/gias_raw.RDS")
gias <- gias_raw %>%
dplyr::select( # select columns of interest
urn, establishmentname, phaseofeducation_name, # school
la_name, # local authority
street, locality, address3, town, county_name, postcode, # address
easting, northing # bng co-ordinates
) %>%
dplyr::mutate(postcode = tolower(gsub(" ", "", postcode))) %>% # simplify
dplyr::filter(
phaseofeducation_name == "Secondary", # only secondaries
la_name %in% c("Cambridgeshire", "Cumbria"), # only these two LAs
!(is.na(postcode)) # remove where postcode == NA
) %>%
dplyr::group_by(la_name) %>% # within each local authority
dplyr::sample_n(5) %>% # randomly sample five schools
dplyr::ungroup()
dplyr::glimpse(gias) # check out the data
## Observations: 10
## Variables: 12
## $ urn <int> 137547, 137527, 136580, 137305, 136992, ...
## $ establishmentname <chr> "Witchford Village College", "Melbourn V...
## $ phaseofeducation_name <chr> "Secondary", "Secondary", "Secondary", "...
## $ la_name <chr> "Cambridgeshire", "Cambridgeshire", "Cam...
## $ street <chr> "Manor Road", "The Moor", "Gibraltar Lan...
## $ locality <chr> "Witchford", "Melbourn", "Swavesey", NA,...
## $ address3 <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA
## $ town <chr> "Ely", "Royston", "Cambridge", "St Ives"...
## $ county_name <chr> "Cambridgeshire", "Hertfordshire", "Camb...
## $ postcode <chr> "cb62ja", "sg86ef", "cb244rs", "pe276rr"...
## $ easting <int> 550625, 538394, 536111, 530564, 519469, ...
## $ northing <int> 279276, 245177, 268180, 272046, 260798, ...
Create separate dataframes of postcodes for the two local authorities, then pass these to the Google Maps API with the gmpasdistance()
function. The basic arguments to this function are the origin and destination (can be an address, postcode or latlong coordinates), the mode of travel (car, public transit, walking) and the return format (shape) of the data (each origin-destination pair per row, or a matrix). You can find more about the arguments by executing ?gmapsdistance
.
# Vectors of postcodes for each local authority
cam_pcd <- gias %>%
dplyr::filter(la_name == "Cambridgeshire") %>% # only schools in this LA
dplyr::select(postcode) %>% # we just want postcode data
dplyr::pull() # pull vector from dataframe
cum_pcd <- gias %>%
dplyr::filter(la_name == "Cumbria") %>%
dplyr::select(postcode) %>%
dplyr::pull()
# Call the API
sch_distances <- gmapsdistance::gmapsdistance(
origin = cam_pcd, # start point of journey
destination = cum_pcd, # end point of journey
mode = "driving", # driving time
shape = "long" # format of output data (origin and destination as cols)
)
The output is a list of three elements (in this order:
Each list element has three columns:
or
– the origin pointde
– the destination pointTime
, Distance
and Status
Let’s start with Status
. Were all the requests actioned?
sch_distances$Status # isolate status element of returned list
## or de status
## 1 cb62ja ca13rq OK
## 2 sg86ef ca13rq OK
## 3 cb244rs ca13rq OK
## 4 pe276rr ca13rq OK
## 5 pe191lq ca13rq OK
## 6 cb62ja ca144eb OK
## 7 sg86ef ca144eb OK
## 8 cb244rs ca144eb OK
## 9 pe276rr ca144eb OK
## 10 pe191lq ca144eb OK
## 11 cb62ja ca79px OK
## 12 sg86ef ca79px OK
## 13 cb244rs ca79px OK
## 14 pe276rr ca79px OK
## 15 pe191lq ca79px OK
## 16 cb62ja la185ab OK
## 17 sg86ef la185ab OK
## 18 cb244rs la185ab OK
## 19 pe276rr la185ab OK
## 20 pe191lq la185ab OK
## 21 cb62ja ca156nt OK
## 22 sg86ef ca156nt OK
## 23 cb244rs ca156nt OK
## 24 pe276rr ca156nt OK
## 25 pe191lq ca156nt OK
We want the status OK
, which indicates that there were no problems and the distances were collected with no errors. The PLACE_NOT_FOUND
error is returned in the Status
column when Google Maps can’t locate your origin or destination.
So what were the distances between the locations in metres?
sch_distances$Distance # isolate distance (metres) element of returned list
## or de Distance
## 1 cb62ja ca13rq 405547
## 2 sg86ef ca13rq 423238
## 3 cb244rs ca13rq 399736
## 4 pe276rr ca13rq 388260
## 5 pe191lq ca13rq 393899
## 6 cb62ja ca144eb 435256
## 7 sg86ef ca144eb 452947
## 8 cb244rs ca144eb 429445
## 9 pe276rr ca144eb 417969
## 10 pe191lq ca144eb 423608
## 11 cb62ja ca79px 411224
## 12 sg86ef ca79px 428916
## 13 cb244rs ca79px 405414
## 14 pe276rr ca79px 393937
## 15 pe191lq ca79px 399576
## 16 cb62ja la185ab 442553
## 17 sg86ef la185ab 449368
## 18 cb244rs la185ab 425866
## 19 pe276rr la185ab 417476
## 20 pe191lq la185ab 416797
## 21 cb62ja ca156nt 436768
## 22 sg86ef ca156nt 454460
## 23 cb244rs ca156nt 430958
## 24 pe276rr ca156nt 419482
## 25 pe191lq ca156nt 425121
And how much time does this translate to, in seconds, when driving between the locations?
sch_distances$Time # isolate time (seconds) element of returned list
## or de Time
## 1 cb62ja ca13rq 16439
## 2 sg86ef ca13rq 16752
## 3 cb244rs ca13rq 15492
## 4 pe276rr ca13rq 15352
## 5 pe191lq ca13rq 15471
## 6 cb62ja ca144eb 17868
## 7 sg86ef ca144eb 18180
## 8 cb244rs ca144eb 16920
## 9 pe276rr ca144eb 16780
## 10 pe191lq ca144eb 16899
## 11 cb62ja ca79px 16967
## 12 sg86ef ca79px 17280
## 13 cb244rs ca79px 16020
## 14 pe276rr ca79px 15880
## 15 pe191lq ca79px 15999
## 16 cb62ja la185ab 17998
## 17 sg86ef la185ab 18121
## 18 cb244rs la185ab 16861
## 19 pe276rr la185ab 16701
## 20 pe191lq la185ab 16645
## 21 cb62ja ca156nt 18170
## 22 sg86ef ca156nt 18483
## 23 cb244rs ca156nt 17223
## 24 pe276rr ca156nt 17083
## 25 pe191lq ca156nt 17202
That’s great, but still not super-friendly to use, especially over long distances.
Let’s create a more meaningful table for our purposes. Let’s say we only care about the distances for now, so we’ll focus on that element of the list and join in information about the origin from the Get Information About Schools data
distance_info <- sch_distances$Distance %>% # to the distance data...
dplyr::left_join(
y = select(
gias,
establishmentname, postcode # join these columns from the GIAS data
),
by = c("or" = "postcode") # match on postcode values (origin)
) %>%
dplyr::left_join( # now join...
y = select(
gias, # from GIAS...
establishmentname, postcode # these columns
),
by = c("de" = "postcode"), # match on postcode values (destination)
suffix = c("_or", "_de") # add col name suffixes for origin/destination
)
dplyr::glimpse(distance_info) # inspect data
## Observations: 25
## Variables: 5
## $ or <chr> "cb62ja", "sg86ef", "cb244rs", "pe276rr",...
## $ de <chr> "ca13rq", "ca13rq", "ca13rq", "ca13rq", "...
## $ Distance <dbl> 405547, 423238, 399736, 388260, 393899, 4...
## $ establishmentname_or <chr> "Witchford Village College", "Melbourn Vi...
## $ establishmentname_de <chr> "Newman Catholic School", "Newman Catholi...
While we’re at it, we can look arbitrarily at the longest distances.
distance_info %>%
dplyr::mutate( # create new columns
Kilometres = round(Distance/1000, 1), # calculate km from m
Miles = round(Kilometres * 0.621371, 1) # convert to miles
) %>%
dplyr::select( # select columns to rename and retain
Origin = establishmentname_or,
Destination = establishmentname_de,
`Kilometres`,
`Miles`
) %>%
dplyr::arrange(desc(Kilometres)) # arrange by longest distance first
## Origin Destination Kilometres Miles
## 1 Melbourn Village College Netherhall School 454.5 282.4
## 2 Melbourn Village College Workington Academy 452.9 281.4
## 3 Melbourn Village College Millom School 449.4 279.2
## 4 Witchford Village College Millom School 442.6 275.0
## 5 Witchford Village College Netherhall School 436.8 271.4
## 6 Witchford Village College Workington Academy 435.3 270.5
## 7 Swavesey Village College Netherhall School 431.0 267.8
## 8 Swavesey Village College Workington Academy 429.4 266.8
## 9 Melbourn Village College The Nelson Thomlinson School 428.9 266.5
## 10 Swavesey Village College Millom School 425.9 264.6
## 11 Longsands Academy Netherhall School 425.1 264.1
## 12 Longsands Academy Workington Academy 423.6 263.2
## 13 Melbourn Village College Newman Catholic School 423.2 263.0
## 14 St Ivo School Netherhall School 419.5 260.7
## 15 St Ivo School Workington Academy 418.0 259.7
## 16 St Ivo School Millom School 417.5 259.4
## 17 Longsands Academy Millom School 416.8 259.0
## 18 Witchford Village College The Nelson Thomlinson School 411.2 255.5
## 19 Witchford Village College Newman Catholic School 405.5 252.0
## 20 Swavesey Village College The Nelson Thomlinson School 405.4 251.9
## 21 Swavesey Village College Newman Catholic School 399.7 248.4
## 22 Longsands Academy The Nelson Thomlinson School 399.6 248.3
## 23 Longsands Academy Newman Catholic School 393.9 244.8
## 24 St Ivo School The Nelson Thomlinson School 393.9 244.8
## 25 St Ivo School Newman Catholic School 388.3 241.3
Manually inputting Melbourn Village College, Cambridgeshire, and Netherhall School, Cumbria, into Google Maps suggests 283 miles, as expected from the gmapsdistance()
output. Success!