Cheerio: Extract Text from HTML with separators

Crisboot · Jul 21, 2015

Let's say I have the following:

$ = cheerio.load('<html><body><ul><li>One</li><li>Two</li></body></html>');

var t = $('html').find('*').contents().filter(function() {
  return this.type === 'text';

I get:


Instead of:

One Two

It's the same result I get if I do $('html').text(). So basically what I need is to inject a separator like (space) or \n

Notice: This is not a jQuery front-end question is more like NodeJS backend related issue with Cheerio and HTML parsing.


Crisboot · Jul 21, 2015

This seems to do the trick:

var t = $('html *').contents().map(function() {
    return (this.type === 'text') ? $(this).text() : '';
}).get().join(' ');



One Two

Just improved my solution a little bit:

var t = $('html *').contents().map(function() {
    return (this.type === 'text') ? $(this).text()+' ' : '';