Iterating over objects in pyquery

AP257 picture AP257 · Jul 13, 2010 · Viewed 8.7k times · Source

I'm scraping a page with Python's pyquery, and I'm kinda confused by the types it returns, and in particular how to iterate over a list of results.

If my HTML looks a bit like this:

<div class="formwrap">blah blah <h3>Something interesting</h3></div>
<div class="formwrap">more rubbish <h3>Something else interesting</h3></div>

How do I get the inside of the <h3> tags, one by one so I can process them? I'm trying:

results_page = pq(response.read())
formwraps = results_page(".formwrap") 
print type(formwraps)
print type([formwraps])
for my_div in [formwraps]:
    print type(my_div)
    print my_div("h3").text() 

This produces:

<class 'pyquery.pyquery.PyQuery'>
<type 'list'>
<class 'pyquery.pyquery.PyQuery'>
Something interesting something else interesting

It looks like there's no actual iteration going on. How can I pull out each element individually?

Extra question from a newbie: what are the square brackets around [a] doing? It looks like it converts a special Pyquery object to a list. Is [] a standard Python operator?

------UPDATE--------

I've found an 'each' function in the pyquery docs. However, I don't understand how to use it for what I want. Say I just want to print out the content of the <h3>. This produces a syntax error: why?

formwraps.each(lambda e: print e("h3").text())

Answer

livibetter picture livibetter · Jul 3, 2013

Since pyquery 1.2.3 (commit), you can use items() of a PyQuery object for going through each item as PyQuery object:

print(type(formwraps.items()))
for my_div in formwraps.items():
    print(my_div("h3").text())

The method items() returns a generator and this will work on both Python 2 and 3.