1 Background

How long does it take to get from point A to point B? We rarely travel in a straight line, so measuring this distance often isn’t helpful; we are likely to underetsimate the actual time and distance taken when using transport. We can use the Google Maps Application Programming Interface (API) – wrapped in an R package called gmapsdistance – to query Google Maps with This is simpler than manually inputting addresses to Google Maps via the browser.

We can supply a sequence of origin and destination points and have all the results returned as an R list-class object for easy analysis. Distances (metres) and journey duration (seconds) are returned, having selected from three transport modes: on foot, by car, or by public transport. (Also by bicycle for parts of North America only.)

Information on the gmapsdistance package:

Note that the local authorities and schools mentioned here were chosen arbitrarily and randomly; they have no particular significance.

2 Prepare the workspace

Install the packages in the form install.packages("gmapsdistance") and load with:

library(gmapsdistance)  # for getting info from the API
library(dplyr)  # for data manipulation and pipes (%>%)

3 Get a school sample

We’ll read in secondary schools data from two local authorities (Cambridgeshire and Cumbria) and sample five from each. We need only details of the schoosl name and location and will be using the postcode as our origin and destination point information. The data are from the Get Information About Schools service.

I have previously saved a version of the GIAS dataset as an RDS file, which is what is read in below. I had prepared this file using the janitor::clean_names() function to tidy the column names, hence the lower case and underscores.

set.seed(1337)  # for reproducibility of our sample

# Read data from Get Information About Schools
# Randomly sample a couple of schools from a couple of local authorities

gias_raw <- readRDS("data/gias_raw.RDS") 

gias <- gias_raw %>% 
  dplyr::select(  # select columns of interest
    urn, establishmentname, phaseofeducation_name,  # school
    la_name,  # local authority
    street, locality, address3, town, county_name, postcode,  # address
    easting, northing  # bng co-ordinates
  ) %>%
  dplyr::mutate(postcode = tolower(gsub(" ", "", postcode))) %>%  # simplify
  dplyr::filter(
    phaseofeducation_name == "Secondary",  # only secondaries
    la_name %in% c("Cambridgeshire", "Cumbria"),  # only these two LAs
    !(is.na(postcode))  # remove where postcode == NA
  ) %>% 
  dplyr::group_by(la_name) %>%  # within each local authority 
  dplyr::sample_n(5) %>%  # randomly sample five schools
  dplyr::ungroup()

dplyr::glimpse(gias)  # check out the data
## Observations: 10
## Variables: 12
## $ urn                   <int> 137547, 137527, 136580, 137305, 136992, ...
## $ establishmentname     <chr> "Witchford Village College", "Melbourn V...
## $ phaseofeducation_name <chr> "Secondary", "Secondary", "Secondary", "...
## $ la_name               <chr> "Cambridgeshire", "Cambridgeshire", "Cam...
## $ street                <chr> "Manor Road", "The Moor", "Gibraltar Lan...
## $ locality              <chr> "Witchford", "Melbourn", "Swavesey", NA,...
## $ address3              <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA
## $ town                  <chr> "Ely", "Royston", "Cambridge", "St Ives"...
## $ county_name           <chr> "Cambridgeshire", "Hertfordshire", "Camb...
## $ postcode              <chr> "cb62ja", "sg86ef", "cb244rs", "pe276rr"...
## $ easting               <int> 550625, 538394, 536111, 530564, 519469, ...
## $ northing              <int> 279276, 245177, 268180, 272046, 260798, ...

4 Calculate distances

Create separate dataframes of postcodes for the two local authorities, then pass these to the Google Maps API with the gmpasdistance() function. The basic arguments to this function are the origin and destination (can be an address, postcode or latlong coordinates), the mode of travel (car, public transit, walking) and the return format (shape) of the data (each origin-destination pair per row, or a matrix). You can find more about the arguments by executing ?gmapsdistance.

# Vectors of postcodes for each local authority

cam_pcd <- gias %>% 
  dplyr::filter(la_name == "Cambridgeshire") %>%   # only schools in this LA
  dplyr::select(postcode) %>%  # we just want postcode data
  dplyr::pull()  # pull vector from dataframe

cum_pcd <- gias %>% 
  dplyr::filter(la_name == "Cumbria") %>% 
  dplyr::select(postcode) %>% 
  dplyr::pull()

# Call the API

sch_distances <- gmapsdistance::gmapsdistance(
  origin = cam_pcd,  # start point of journey
  destination = cum_pcd,  # end point of journey
  mode = "driving",  # driving time
  shape = "long"  # format of output data (origin and destination as cols)
)

The output is a list of three elements (in this order:

  1. Time (seconds)
  2. Distance (metres)
  3. Status (i.e. could the calculation be made?)

Each list element has three columns:

  • or – the origin point
  • de – the destination point
  • one of Time, Distance and Status

Let’s start with Status. Were all the requests actioned?

sch_distances$Status  # isolate status element of returned list
##         or      de status
## 1   cb62ja  ca13rq     OK
## 2   sg86ef  ca13rq     OK
## 3  cb244rs  ca13rq     OK
## 4  pe276rr  ca13rq     OK
## 5  pe191lq  ca13rq     OK
## 6   cb62ja ca144eb     OK
## 7   sg86ef ca144eb     OK
## 8  cb244rs ca144eb     OK
## 9  pe276rr ca144eb     OK
## 10 pe191lq ca144eb     OK
## 11  cb62ja  ca79px     OK
## 12  sg86ef  ca79px     OK
## 13 cb244rs  ca79px     OK
## 14 pe276rr  ca79px     OK
## 15 pe191lq  ca79px     OK
## 16  cb62ja la185ab     OK
## 17  sg86ef la185ab     OK
## 18 cb244rs la185ab     OK
## 19 pe276rr la185ab     OK
## 20 pe191lq la185ab     OK
## 21  cb62ja ca156nt     OK
## 22  sg86ef ca156nt     OK
## 23 cb244rs ca156nt     OK
## 24 pe276rr ca156nt     OK
## 25 pe191lq ca156nt     OK

We want the status OK, which indicates that there were no problems and the distances were collected with no errors. The PLACE_NOT_FOUND error is returned in the Status column when Google Maps can’t locate your origin or destination.

So what were the distances between the locations in metres?

sch_distances$Distance  # isolate distance (metres) element of returned list
##         or      de Distance
## 1   cb62ja  ca13rq   405547
## 2   sg86ef  ca13rq   423238
## 3  cb244rs  ca13rq   399736
## 4  pe276rr  ca13rq   388260
## 5  pe191lq  ca13rq   393899
## 6   cb62ja ca144eb   435256
## 7   sg86ef ca144eb   452947
## 8  cb244rs ca144eb   429445
## 9  pe276rr ca144eb   417969
## 10 pe191lq ca144eb   423608
## 11  cb62ja  ca79px   411224
## 12  sg86ef  ca79px   428916
## 13 cb244rs  ca79px   405414
## 14 pe276rr  ca79px   393937
## 15 pe191lq  ca79px   399576
## 16  cb62ja la185ab   442553
## 17  sg86ef la185ab   449368
## 18 cb244rs la185ab   425866
## 19 pe276rr la185ab   417476
## 20 pe191lq la185ab   416797
## 21  cb62ja ca156nt   436768
## 22  sg86ef ca156nt   454460
## 23 cb244rs ca156nt   430958
## 24 pe276rr ca156nt   419482
## 25 pe191lq ca156nt   425121

And how much time does this translate to, in seconds, when driving between the locations?

sch_distances$Time  # isolate time (seconds) element of returned list
##         or      de  Time
## 1   cb62ja  ca13rq 16439
## 2   sg86ef  ca13rq 16752
## 3  cb244rs  ca13rq 15492
## 4  pe276rr  ca13rq 15352
## 5  pe191lq  ca13rq 15471
## 6   cb62ja ca144eb 17868
## 7   sg86ef ca144eb 18180
## 8  cb244rs ca144eb 16920
## 9  pe276rr ca144eb 16780
## 10 pe191lq ca144eb 16899
## 11  cb62ja  ca79px 16967
## 12  sg86ef  ca79px 17280
## 13 cb244rs  ca79px 16020
## 14 pe276rr  ca79px 15880
## 15 pe191lq  ca79px 15999
## 16  cb62ja la185ab 17998
## 17  sg86ef la185ab 18121
## 18 cb244rs la185ab 16861
## 19 pe276rr la185ab 16701
## 20 pe191lq la185ab 16645
## 21  cb62ja ca156nt 18170
## 22  sg86ef ca156nt 18483
## 23 cb244rs ca156nt 17223
## 24 pe276rr ca156nt 17083
## 25 pe191lq ca156nt 17202

That’s great, but still not super-friendly to use, especially over long distances.

5 Manipulate the data

Let’s create a more meaningful table for our purposes. Let’s say we only care about the distances for now, so we’ll focus on that element of the list and join in information about the origin from the Get Information About Schools data

distance_info <- sch_distances$Distance %>%  # to the distance data...
  dplyr::left_join(
    y = select(
      gias, 
      establishmentname, postcode  # join these columns from the GIAS data
      ),   
    by = c("or" = "postcode")  # match on postcode values (origin)
  ) %>% 
  dplyr::left_join(   # now join...
    y = select(
      gias,  # from GIAS...
      establishmentname, postcode  # these columns
      ),
    by = c("de" = "postcode"),  # match on postcode values (destination)
    suffix = c("_or", "_de")  # add col name suffixes for origin/destination
  )

dplyr::glimpse(distance_info)  # inspect data
## Observations: 25
## Variables: 5
## $ or                   <chr> "cb62ja", "sg86ef", "cb244rs", "pe276rr",...
## $ de                   <chr> "ca13rq", "ca13rq", "ca13rq", "ca13rq", "...
## $ Distance             <dbl> 405547, 423238, 399736, 388260, 393899, 4...
## $ establishmentname_or <chr> "Witchford Village College", "Melbourn Vi...
## $ establishmentname_de <chr> "Newman Catholic School", "Newman Catholi...

While we’re at it, we can look arbitrarily at the longest distances.

distance_info %>%
  dplyr::mutate(  # create new columns
    Kilometres = round(Distance/1000, 1),  # calculate km from m
    Miles = round(Kilometres * 0.621371, 1)  # convert to miles
  ) %>% 
  dplyr::select(  # select columns to rename and retain
    Origin = establishmentname_or,
    Destination = establishmentname_de,
    `Kilometres`,
    `Miles`
    ) %>% 
  dplyr::arrange(desc(Kilometres)) # arrange by longest distance first
##                       Origin                  Destination Kilometres Miles
## 1   Melbourn Village College            Netherhall School      454.5 282.4
## 2   Melbourn Village College           Workington Academy      452.9 281.4
## 3   Melbourn Village College                Millom School      449.4 279.2
## 4  Witchford Village College                Millom School      442.6 275.0
## 5  Witchford Village College            Netherhall School      436.8 271.4
## 6  Witchford Village College           Workington Academy      435.3 270.5
## 7   Swavesey Village College            Netherhall School      431.0 267.8
## 8   Swavesey Village College           Workington Academy      429.4 266.8
## 9   Melbourn Village College The Nelson Thomlinson School      428.9 266.5
## 10  Swavesey Village College                Millom School      425.9 264.6
## 11         Longsands Academy            Netherhall School      425.1 264.1
## 12         Longsands Academy           Workington Academy      423.6 263.2
## 13  Melbourn Village College       Newman Catholic School      423.2 263.0
## 14             St Ivo School            Netherhall School      419.5 260.7
## 15             St Ivo School           Workington Academy      418.0 259.7
## 16             St Ivo School                Millom School      417.5 259.4
## 17         Longsands Academy                Millom School      416.8 259.0
## 18 Witchford Village College The Nelson Thomlinson School      411.2 255.5
## 19 Witchford Village College       Newman Catholic School      405.5 252.0
## 20  Swavesey Village College The Nelson Thomlinson School      405.4 251.9
## 21  Swavesey Village College       Newman Catholic School      399.7 248.4
## 22         Longsands Academy The Nelson Thomlinson School      399.6 248.3
## 23         Longsands Academy       Newman Catholic School      393.9 244.8
## 24             St Ivo School The Nelson Thomlinson School      393.9 244.8
## 25             St Ivo School       Newman Catholic School      388.3 241.3

Manually inputting Melbourn Village College, Cambridgeshire, and Netherhall School, Cumbria, into Google Maps suggests 283 miles, as expected from the gmapsdistance() output. Success!

Google Maps directions