I'm having issues using the selectors that are available for Cheerio.js that I use on my Node server. The core is supposedly based on jQuery, however I can't really make it work using the same selection I would with native jQuery.
I have a DOM that roughly looks like this:
<div class="test">
<table class="listing">
<thead><tr>few cells here</tr></thead>
<tfoot></tfoot>
<tbody><tr>These are the rows I want</tr></tbody>
</table>
</div>
Since there are two tables on the page with the class "listing" I can't select that directly so I need the reference to the div with the "test" class. The selection I can run with jQuery would be something like:
$('div.test tbody tr')
But this doesn't work with Cheerio. If I run $('div[class="test"] tr') I get all the rows on the table, even the thead rows, so that doesn't work for me.
Any guesses?
Update: This is the actual code I'm executing (does not work):
// Load the html
var $ = cheerio.load(html, {
normalizeWhitespace: true
});
$('div.tillgodo tbody tr').each(function(){
console.log("Found credited course...");
var children = $(this).children();
var credits = parseFloat($(children[3]).text().replace(',', '.')); // We need to replace comma with a dot since parseFloats only supports dots by design
var row = {
"course" : $(children[1]).text().trim(),
"grade" : null,
"credits" : credits,
"date" : $(children[4]).text()
};
// Push course to JSON object
console.log("Push course to object...");
console.log("------------------------------------------\n");
data.credited_courses.push(row);
data.credited_courses_credits += parseFloat(credits);
});
The following code works for the first table:
$('tr.incomplete.course').each(function(i, tr){
console.log("This is course nr: " + parseInt(course_count+1));
console.log("Found incompleted course...");
var children = $(this).children();
var credits = parseFloat($(children[2]).text().replace(',', '.').match(/(\+|-)?((\d+(\.\d+)?)|(\.\d+))/)[0]); // Filter out any parentheses and odd characters
var row = {
"course" : $(children[1]).text(),
"grade" : $(children[3]).text(),
"credits" : credits,
"date" : $(children[5]).text()
};
// Sum the total amount of credits for all courses
console.log("Add credits to incompleted_credits...");
data.incompleted_credits += credits;
console.log("Push course to object...");
data.incompleted_courses.push(row);
course_count++;
});
When I say that it doesn't work means that the JSON object I'm returning does not have the expected rows from the second table.
Update 2 The table I want to scrape:
<div class="tillgodo">
<h2>Tillgodoräknanden</h2>
<table class="listing">
<thead>
<tr class="listingHeader">
<th>Kurskod</th>
<th>Kursnamn</th>
<th>Beslutsfattare</th>
<th class="credits">Poäng</th>
<th>Datum</th>
</tr>
</thead>
<tfoot>
<tr class="listingTrailer">
<td>
</td><td colspan="2">Summa tillgodoräknade poäng:
</td><td class="credits">10,5
</td><td>
</td></tr>
</tfoot>
<tbody><tr>
<td>
</td><td>Valfria kurser
</td><td>xxx
</td><td class="credits">10,5
</td><td class="nobreak">2013-06-03
</td></tr>
</tbody>
</table>
</div>
Final update (problem solved) The selector I had been using all along was working. But the source HTML was malformed and had no tbody tag at all. The browser (Chrome in my case) fixed the problem but made it hard to find the real issue.
you can try $(div.test table.listing tr).text()
This will give you the text from all the tr tags in that table