I am trying to use python3 to return the bibtex citation generated by http://www.doi2bib.org/. The url's are predictable so the script can work out the url without having to interact with the web page. I have tried using selenium, bs4, etc but cant get the text inside the box.
url = "http://www.doi2bib.org/#/doi/10.1007/s00425-007-0544-9"
import urllib.request
from bs4 import BeautifulSoup
text = BeautifulSoup(urllib.request.urlopen(url).read())
print(text)
Can anyone suggest a way of returning the bibtex citation as a string (or whatever) in python?
You don't need BeautifulSoup
here. There is an additional XHR request sent to the server to fill out the bibtex citation, simulate it, for example, with requests
:
import requests
bibtex_id = '10.1007/s00425-007-0544-9'
url = "http://www.doi2bib.org/#/doi/{id}".format(id=bibtex_id)
xhr_url = 'http://www.doi2bib.org/doi2bib'
with requests.Session() as session:
session.get(url)
response = session.get(xhr_url, params={'id': bibtex_id})
print(response.content)
Prints:
@article{Burgert_2007,
doi = {10.1007/s00425-007-0544-9},
url = {http://dx.doi.org/10.1007/s00425-007-0544-9},
year = 2007,
month = {jun},
publisher = {Springer Science $\mathplus$ Business Media},
volume = {226},
number = {4},
pages = {981--987},
author = {Ingo Burgert and Michaela Eder and Notburga Gierlinger and Peter Fratzl},
title = {Tensile and compressive stresses in tracheids are induced by swelling based on geometrical constraints of the wood cell},
journal = {Planta}
}
You can also solve it with selenium
. The key trick here is to use an Explicit Wait to wait for the citation to become visible:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Firefox()
driver.get('http://www.doi2bib.org/#/doi/10.1007/s00425-007-0544-9')
element = WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.XPATH, '//pre[@ng-show="bib"]')))
print(element.text)
driver.close()
Prints the same as the above solution.