Wiki quotes API?

sparkle picture sparkle · Dec 7, 2012 · Viewed 8.6k times · Source

I would want to get a structured version of a Wikiquote page via JSON (basically I need all phrases)

Example: http://en.wikiquote.org/wiki/Fight_Club_(film)

I tried with: http://en.wikiquote.org/w/api.php?format=xml&action=parse&page=Fight_Club_(film)&prop=text

but I get all HTML source code. I need each pharse as an element of an Array

How could I achieve that with DBPEDIA?

http://f.cl.ly/items/2v3w1U2c0J0z1M0V0k0b/Schermata%2012-2456269%20alle%2013.06.24.png

Answer

djd picture djd · Dec 7, 2012

For one thing Iam not sure whether you can query wiki quotes using DBpedia and secondly, DBpedia gives you only info box data in a structured way, it does not in a any way the article content in a structured way. Instead with a little bit of trouble you can use the Media wiki api to get the data

EDIT:

The URI you are trying gives you a text so this will make things easier but not completely. Try this piece of code in your console.

require 'Nokogiri'

content = JSON.parse(open("http://en.wikiquote.org/w/api.php?format=json&action=parse&page=Fight_Club_%28film%29&prop=text").read)

data = content['parse']['text']['*']

xpath_data = Nokogiri::HTML data

xpath_data.xpath("//ul/li").map{|data_node| data_node.text}

This is the closest i have come to the answer, off course this is not completely right because you will get a lot on unnecessary data. But if you dig into Nokogiri and xpath and find out how to pin point the nodes you need you can get a solution which will give you correct quotes at least 90% of the times