Rendering HTML+Javascript server-side

Dimitar Velitchkov picture Dimitar Velitchkov · Dec 5, 2011 · Viewed 9.8k times · Source

I need to render an HTML page server-side and "extract" the raw bytes of a canvas element so I can save it to a PNG. Problem is, the canvas element is created from javascript (I'm using jquery's Flot to generate a chart, basically). So I guess I need a way to "host" the DOM+Javascript functionality from a browser without actually using the browser. I settled on mshtml (but open to any and all suggestions) as it seems that it should be able to to exactly that. This is an ASP.NET MVC project.

I've searched far and wide and haven't seen anything conclusive.

So I have this simple HTML - example kept as simple as possible to demonstrate the problem -

<!DOCTYPE html>
<html>
<head>
    <title>Wow</title>
    <script src="http://ajax.aspnetcdn.com/ajax/jQuery/jquery-1.7.1.min.js" type="text/javascript"></script>
</head>
<body>
    <div id="hello">
    </div>
    <script type="text/javascript">
        function simple() 
        {
            $("#hello").append("<p>Hello</p>");
        }                    
    </script>
</body>
</html>

which produces the expected output when run from a browser.

I want to be able to load the original HTML into memory, execute the javascript function, then manipulate the final DOM tree. I cannot use any System.Windows.WebBrowser-like class, as my code needs to run in a service environment.

So here's my code:

IHTMLDocument2 domRoot = (IHTMLDocument2)new HTMLDocument();

        using (WebClient wc = new WebClient())
        {
            using (var stream = new StreamReader(wc.OpenRead((string)url)))
            {
                string html = stream.ReadToEnd();
                domRoot.write(html);
                domRoot.close();
            }
        }

        while (domRoot.readyState != "complete")
            Thread.Sleep(SleepTime);

        string beforeScript = domRoot.body.outerHTML;

        IHTMLWindow2 parentWin = domRoot.parentWindow;            
        parentWin.execScript("simple");

        while (domRoot.readyState != "complete")
            Thread.Sleep(SleepTime);


        string afterScript = domRoot.body.outerHTML;

        System.Runtime.InteropServices.Marshal.FinalReleaseComObject(domRoot);
        domRoot = null;

The problem is, "beforeScript" and "afterScript" are exactly the same. The IHTMLDocument2 instance goes through the normal "uninitialized", "loading", "complete" cycle, no errors are thrown, nothing.

Anybody have any ideas on what I'm doing wrong? Completely lost here.

Answer

Efe Kaptan picture Efe Kaptan · Dec 6, 2011

You can consider using Watin. Generate your page then use Watin api to capture the generated page.

http://fwdnug.com/blogs/ddodgen/archive/2008/06/19/watin-api-capturewebpagetofile.aspx