How to access DOM using Node.js?

ameni picture ameni · Dec 16, 2015 · Viewed 15.9k times · Source

I have an editor.html that contains generatePNG function:

  <!DOCTYPE html> 
<html> 
<head> 
    <meta charset="UTF-8"> 
    <title>Diagram</title> 

    <script type="text/javascript" src="lib/jquery-1.8.1.js"></script> 
//    <!-- I use many resources -->
<script></script> 

    <script> 

        function generatePNG (oViewer) { 
            var oImageOptions = { 
                includeDecoratorLayers: false, 
                replaceImageURL: true 
            }; 

            var d = new Date(); 
            var h = d.getHours(); 
            var m = d.getMinutes(); 
            var s = d.getSeconds(); 

            var sFileName = "diagram" + h.toString() + m.toString() + s.toString() + ".png"; 

            var sResultBlob = oViewer.generateImageBlob(function(sBlob) { 
                b = 64; 
                var reader = new window.FileReader(); 
                reader.readAsDataURL(sBlob); 
                reader.onloadend = function() { 
                    base64data = reader.result; 
                    var image = document.createElement('img'); 
                    image.setAttribute("id", "GraphImage"); 
                    image.src = base64data; 
                    document.body.appendChild(image); 
                } 

            }, "image/png", oImageOptions); 
            return sResult; 
        } 

    </script> 


</head> 

<body > 
    <div id="diagramContainer"></div> 
</body> 
</html>

I want to access the DOM and get image.src using Node.js. I find that I can work with cheerio or jsdom.

I start with this:

var cheerio = require('cheerio'),
    $ = cheerio.load('editor.html');

But I don't find how to access and get image.src.

Answer

Rogier Spieker picture Rogier Spieker · Dec 16, 2015

The problem is that by loading an html file into cheerio (or any other node module) will not process the HTML as a browser does. Assets (such as stylesheets, images and javascripts) will not be loaded and/or processed as they would be within a browser.

While both node.js and modern webbrowsers have the same (or similar) javascript engines, however a browser adds a lot of additional stuff, such as window, the DOM (document), etc. Node.js does not have these concepts, so there is no window.FileReader nor document.createElement.

If the image is created entirely without user interaction (your code sample 'magically' receives the sBlob argument wich appears to be a string like data:<type>;<encoding>,<data>) you could use a so called headless browser on the server, PhantomJS seems most popular these days. Then again, if no user interaction is required for the creation of the sBlob, you are probably better off using a pure node.js solution, e.g. How do I parse a data URL in Node?.

If there is some kind of user interaction required to create the sBlob, and you need to store it on a server, you can use pretty much the same solution as mentioned by simply sending the sBlob to the server using Ajax or a websocket, processing the sBlob into an image and (optionally) returning the URL where to find the image.