I'm trying to use the rvest package to scrape data from a web page. In a simple format, the html code looks like this:
<div class="style">
<input id="a" value="123">
<input id="b">
</div>
I want to get the value 123 from the first input. I tried the following R code:
library(rvest)
url<-"xxx"
output<-html_nodes(url, ".style input")
This will return a list of input tags:
[[1]]
<input id="a" value="123">
[[2]]
<input id="b">
Next I tried using html_node to reference the first input tag by id:
html_node(output, "#a")
Here it returned a list of nulls instead of the input tag I want.
[[1]]
NULL
[[2]]
NULL
My question is, how can I reference the input tag using its id?
You can use xpath:
require(rvest)
text <- '<div class="style">
<input id="a" value="123">
<input id="b">
</div>'
h <- read_html(text)
h %>%
html_nodes(xpath = '//*[@id="a"]') %>%
xml_attr("value")
The easiest way to get css- and xpath-selector is to use http://selectorgadget.com/. For a specific attribute like yours use chrome's developer toolbar to get the xpath as follows: