Wikipedia Mediawiki API get Pageid from URL

Shreyas Chavan picture Shreyas Chavan · Jul 28, 2015 · Viewed 9.5k times · Source

I have a set of full urls like

http://en.wikipedia.org/wiki/Episkopi_Bay
http://en.wikipedia.org/wiki/Monte_Lauro
http://en.wikipedia.org/wiki/Lampedusa
http://en.wikipedia.org/wiki/Himera
http://en.wikipedia.org/wiki/Lago_Cecita
http://en.wikipedia.org/wiki/Aspromonte

I want to find wikipedia pageids for these URLS. I have used the Mediawiki API before but I cant figure out how I may do this.

I have tried extracting the page title from the URLs by taking a substring of lastindexof("/") and the last character and then querying the API to get pageid.

http://en.wikipedia.org/wiki/Episkopi_Bay --> Episkopi_Bay
http://en.wikipedia.org/wiki/Monte_Lauro --> Monte_Lauro
http://en.wikipedia.org/wiki/Lampedusa -- > Lampedusa
http://en.wikipedia.org/wiki/Himera --> Himera
http://en.wikipedia.org/wiki/Lago_Cecita --> Lago_Cecita
http://en.wikipedia.org/wiki/Aspromonte --> Aspromonte

But the problem is that some of my links might be redirects and hence the substring might not always be the title of the page.

TL;DR : How can I find the pageid of a wikipedia page from a URL ?

Answer

Seb35 picture Seb35 · Jul 28, 2015

I’m not sure if what you call "page id" is the identification number of the page (e.g. 15580374 for English Wikipedia’s Main Page -- found on "Page information" in the toobox in left column) or the normalised title of a page with redirects resolved. The answer below will answer both.

You can use the API action=query, e.g. https://en.wikipedia.org/w/api.php?action=query&titles=Main%20Page where you will find minimal information, whose the page id (number).

You can also want to manage more complex cases: title normalisation and/or redirects. Title normalisation (initial capital, underscores changed to spaces, various unicode normalisations iirc, etc.) is included out-of-the box. For redirects, you have to ask specifically by adding "&redirects" to the URL (note that double redirects (=redirect of a redirect) won’t work, but the should not be out there). Example: https://en.wikipedia.org/w/api.php?action=query&titles=main_page&redirects

If you need more information, you can look at https://en.wikipedia.org/w/api.php?action=help&modules=query%2Binfo.