I Have a HTML document with links links, for exemple:
<html>
<body>
<ul>
<li><a href="http://someurl.com/etc/etc">teste1</a></li>
<li><a href="http://someurl.com/etc/etc">teste2</a></li>
<li><a href="http://someurl.com/etc/etc">teste3</a></li>
<ul>
</body>
</html>
I want with Ruby on Rails, with nokogiri or some other method, to have a final doc like this:
<html>
<body>
<ul>
<li><a href="http://myproxy.com/?url=http://someurl.com/etc/etc">teste1</a></li>
<li><a href="http://myproxy.com/?url=http://someurl.com/etc/etc">teste2</a></li>
<li><a href="http://myproxy.com/?url=http://someurl.com/etc/etc">teste3</a></li>
<ul>
</body>
</html>
What's the best strategy to achieve this?
If you choose to use Nokogiri, I think this should work:
require 'cgi'
require 'rubygems' rescue nil
require 'nokogiri'
file_path = "your_page.html"
doc = Nokogiri::HTML(open(file_path))
doc.css("a").each do |link|
link.attributes["href"].value = "http://myproxy.com/?url=#{CGI.escape link.attributes["href"].value}"
end
doc.write_to(open(file_path, 'w'))
If I'm not mistaken rails loads REXML up by default, depending on what you're trying to do you could use this also.