How to get children of elements by Puppeteer

Googlebot picture Googlebot · Apr 12, 2019 · Viewed 9.5k times · Source

I understand that puppeteer get its own handles rather than standard DOM elements, but I don't understand why I cannot continue the same query by found elements as

const els = await page.$$('div.parent');

for (let i = 0; i < els.length; i++) {
    const img = await els[i].$('img').getAttribute('src');
    console.log(img);
    const link = await els[i].$('a').getAttribute('href');
    console.log(link);
}

Answer

Thomas Dondorf picture Thomas Dondorf · Apr 12, 2019

Problem

The element handles are necessary as an abstraction layer between the Node.js and browser runtime. The actual DOM elements are not sent to the Node.js environment.

That means when you want to get an attribute from an element, there has to be data transferred to the browser (which DOM element to use) and back (the result).

Solution

Therefore, the result from await els[i].$('img') is not really the DOM element, but only a wrapper that links to the element in the browser environment. To get the attribute, you have to use a function like elementHandle.$eval:

const imgSrc = await els[i].$eval('img', el => el.getAttribute('src'));

This runs the querySelector function on the given element and executes the given function to return its attribute.