How to get all text from all tags in one array?

smotru picture smotru · Jul 18, 2013 · Viewed 12k times · Source

I need to create an array which contains all text from a page without jQuery. This is my html:

<html>
<head>
    <title>Hello world!</title>
</head>
<body>
    <h1>Hello!</h1>
    <p>
        <div>What are you doing?</div>
        <div>Fine, and you?</div>
    </p>
    <a href="http://google.com">Thank you!</a>
</body>
</html>

Here is what i want to get

text[1] = "Hello world!";
text[2] = "Hello!";
text[3] = "What are you doing?";
text[4] = "Fine, and you?";
text[5] = "Thank you!";

Here is what i have tried but seems to not work correctly in my browser:

var elements = document.getElementsByTagName('*');
console.log(elements);

PS. I need to use document.getElementsByTagName('*'); and exclude "script" and "style".

Answer

iConnor picture iConnor · Jul 18, 2013
  var array = [];

    var elements = document.body.getElementsByTagName("*");

    for(var i = 0; i < elements.length; i++) {
       var current = elements[i];
        if(current.children.length === 0 && current.textContent.replace(/ |\n/g,'') !== '') {
           // Check the element has no children && that it is not empty
           array.push(current.textContent);
        }
    } 

You could do something like this

Demo

result = ["What are you doing?", "Fine, and you?"]

or you could use document.documentElement.getElementsByTagName('*');

Also make sure your code is inside this

document.addEventListener('DOMContentLoaded', function(){

   /// Code...
});

If it's just the title you need, you may aswell do this

array.push(document.title);

Saves looping through scripts & styles