vignettes/introduction.Rmd
introduction.Rmd
You can install {altcheckr} from GitHub using the {remotes} package.
install.packages(remotes)
remotes::install_github("matt-dray/altcheckr")
library(altcheckr)
Use the alt_get()
function to scrape the attributes of each <img>
element on a web page that you name in the url
argument,
The function uses {xml2} and {rvest} to scrape a given web page and extract image attributes, with a little bit of {purrr} to get it into a data frame.
get_img <- alt_get("https://www.bbc.co.uk/news")
The function returns a tibble where each row is an image element from that page and columns are the the image source (src
), alt text (alt
) and link to a file with a longer description (longdesc
), if it exists (sometimes used for complex images). The alt
column will be created and filled with NA
if it isn’t present.
Setting the argument all_attributes
to TRUE
will return all the attributes provided in the <img>
element, not just src
, alt
and longdesc
.
Here is a preview of the tibble that is output from alt_get()
:
print(get_img)
#> # A tibble: 50 x 2
#> src alt
#> <chr> <chr>
#> 1 https://a1.api.bbc.co.uk/hit.xiti?&co… ""
#> 2 https://ichef.bbci.co.uk/news/320/cps… "Alexei Navalny (centre) is escorted …
#> 3 … "Guatemalan soldiers and police beat …
#> 4 … "Militia groups gather to protect pro…
#> 5 … "Capitol Police officer wearing a MAG…
#> 6 … "Police in DC"
#> 7 … "A traveller passes through O'Hare In…
#> 8 … "Members of a rescue team work at the…
#> 9 … "Australian player Bernard Tomic pict…
#> 10 … "Britain's Health Secretary Matt Hanc…
#> # … with 40 more rows
You can then pass the output of alt_get()
to alt_check()
to perform a series of basic assessments of each image’s alt text.
(You can also pass any data frame that contains a src
and alt
column, where alt
contains the text to be assessed by alt_check()
. For example, {altcheckr} has a built-in dataset: example_get
.)
check_img <- alt_check(get_img)
This will return the same tibble as alt_get()
, but new columns have now been appended.
Each new column is the outcome of a check for a possible accessibility issue with the alt text. For example, whether the alt text actually exists and whether it is long.
print(check_img)
#> # A tibble: 50 x 10
#> src alt alt_exists nchar_count nchar_assess file_ext self_evident
#> <chr> <chr> <chr> <int> <chr> <lgl> <lgl>
#> 1 http… "" Empty NA <NA> NA NA
#> 2 http… "Ale… Exists 103 OK FALSE TRUE
#> 3 data… "Gua… Exists 105 OK FALSE FALSE
#> 4 data… "Mil… Exists 185 Long FALSE FALSE
#> 5 data… "Cap… Exists 41 OK FALSE FALSE
#> 6 data… "Pol… Exists 12 Short FALSE FALSE
#> 7 data… "A t… Exists 55 OK FALSE FALSE
#> 8 data… "Mem… Exists 174 Long FALSE FALSE
#> 9 data… "Aus… Exists 72 OK FALSE TRUE
#> 10 data… "Bri… Exists 171 Long FALSE FALSE
#> # … with 40 more rows, and 3 more variables: terminal_punct <lgl>,
#> # spellcheck <list>, not_basic <list>
And here is the structure now:
dplyr::glimpse(check_img)
#> Rows: 50
#> Columns: 10
#> $ src <chr> "https://a1.api.bbc.co.uk/hit.xiti?&col=1&from=p&ptag=…
#> $ alt <chr> "", "Alexei Navalny (centre) is escorted by police in …
#> $ alt_exists <chr> "Empty", "Exists", "Exists", "Exists", "Exists", "Exis…
#> $ nchar_count <int> NA, 103, 105, 185, 41, 12, 55, 174, 72, 171, 44, 104, …
#> $ nchar_assess <chr> NA, "OK", "OK", "Long", "OK", "Short", "OK", "Long", "…
#> $ file_ext <lgl> NA, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, F…
#> $ self_evident <lgl> NA, TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, TR…
#> $ terminal_punct <lgl> NA, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, FA…
#> $ spellcheck <list> [<>, <"Navalny", "centre", "Khimki">, "Chiquimula", <…
#> $ not_basic <list> [<>, <"alexei", "navalny", "centre", "escorted", "pol…