I use XML package to get the links from this url.
# Parse HTML URL
v1WebParse <- htmlParse(v1URL)
# Read links and and get the quotes of the companies from the href
t1Links <- data.frame(xpathSApply(v1WebParse, '//a', xmlGetAttr, 'href'))
While this method is very efficient, I've used rvest
and seems faster at parsing a web than XML
. I tried html_nodes
and html_attrs
but I can't get it to work.
Despite my comment, here's how you can do it with rvest
. Note that we need to read in the page with htmlParse
first since the site has the content-type set to text/plain
for that file and that tosses rvest
into a tizzy.
library(rvest)
library(XML)
pg <- htmlParse("http://www.bvl.com.pe/includes/empresas_todas.dat")
pg %>% html_nodes("a") %>% html_attr("href")
## [1] "/inf_corporativa71050_JAIME1CP1A.html" "/inf_corporativa10400_INTEGRC1.html"
## [3] "/inf_corporativa66100_ACESEGC1.html" "/inf_corporativa71300_ADCOMEC1.html"
## ...
## [273] "/inf_corporativa64801_VOLCAAC1.html" "/inf_corporativa58501_YURABC11.html"
## [275] "/inf_corporativa98959_ZNC.html"
That further illustrates rvest
's XML
package underpinnings.
UPDATE
rvest::read_html()
can handle this directly now:
pg <- read_html("http://www.bvl.com.pe/includes/empresas_todas.dat")