rvest how to select a specific css node by id

Vegebird picture Vegebird · Aug 20, 2015 · Viewed 25.3k times · Source

I'm trying to use the rvest package to scrape data from a web page. In a simple format, the html code looks like this:

<div class="style">
   <input id="a" value="123">
   <input id="b">

I want to get the value 123 from the first input. I tried the following R code:

output<-html_nodes(url, ".style input")

This will return a list of input tags:

<input id="a" value="123">
<input id="b">

Next I tried using html_node to reference the first input tag by id:

html_node(output, "#a")

Here it returned a list of nulls instead of the input tag I want.


My question is, how can I reference the input tag using its id?


Rentrop picture Rentrop · Aug 20, 2015

You can use xpath:

text <- '<div class="style">
   <input id="a" value="123">
   <input id="b">

h <- read_html(text)

h %>% 
  html_nodes(xpath = '//*[@id="a"]') %>%

The easiest way to get css- and xpath-selector is to use http://selectorgadget.com/. For a specific attribute like yours use chrome's developer toolbar to get the xpath as follows: enter image description here