Cheerio: Extract Text from HTML with separators

Crisboot picture Crisboot · Jul 21, 2015 · Viewed 9.5k times · Source

Let's say I have the following:

$ = cheerio.load('<html><body><ul><li>One</li><li>Two</li></body></html>');

var t = $('html').find('*').contents().filter(function() {
  return this.type === 'text';
}).text(); 

I get:

OneTwo

Instead of:

One Two

It's the same result I get if I do $('html').text(). So basically what I need is to inject a separator like (space) or \n

Notice: This is not a jQuery front-end question is more like NodeJS backend related issue with Cheerio and HTML parsing.

Answer

Crisboot picture Crisboot · Jul 21, 2015

This seems to do the trick:

var t = $('html *').contents().map(function() {
    return (this.type === 'text') ? $(this).text() : '';
}).get().join(' ');

console.log(t);

Result:

One Two

Just improved my solution a little bit:

var t = $('html *').contents().map(function() {
    return (this.type === 'text') ? $(this).text()+' ' : '';
}).get().join('');