Simple HTML Pretty Print

Question 1

Simple HTML Pretty Print

javascript jquery html pretty-print pre

James Kyle · Dec 1, 2011 · Viewed 20.3k times · Source

Answer

Answer

Don't be so sure you have gotten all there is to pretty-printing HTML in so few lines. It took me a little more than a year and 2000 lines to really nail this topic. You can just use my code directly or refactor it to fit your needs:

https://github.com/prettydiff/prettydiff/blob/master/lib/markuppretty.js (and Github project)

You can demo it at http://prettydiff.com/?m=beautify&html

The reason why it takes so much code is that people really don't seem to understand or value the importance of text nodes. If you are adding new and empty text nodes during beautification then you are doing it wrong and are likely corrupting your content. Additionally, it is also really ease to screw it up the other way and remove white space from inside your content. You have to be careful about these or you will completely destroy the integrity of your document.

Also, what if your document contains CSS or JavaScript. Those should be pretty printed as well, but have very different requirements from HTML. Even HTML and XML have different requirements. Please take my word for it that this is not a simple thing to figure out. HTML Tidy has been at this for more than a decade and still screws up a lot of edge cases.

As far as I know my markup_beauty.js application is the most complete pretty-printer ever written for HTML/XML. I know that is a very bold statement, and perhaps arrogant, but so far its never been challenged. Look my code and if there is something you need that it is not doing please let me know and I will get around to adding it in.

Question 2

http://jsfiddle.net/JamesKyle/L4b8b/

This may be a futile effort, but I personally think its possible.

I'm not the best at Javascript or jQuery, however I think I have found a simple way of making a simple prettyprint for html.

There are four types of code in this prettyprint:

Plain Text
Elements
Attributes
Values

In order to stylize this I want to wrap elements, attibutes and values with spans with their own classes.

The first way I have of doing this is to store every single kind of element and attribute (shown below) and then wrapping them with the corresponding spans

$(document).ready(function() {

    $('pre.prettyprint.html').each(function() {

        $(this).css('white-space','pre-line');

        var code = $(this).html();

        var html-element = $(code).find('a, abbr, acronym, address, area, article, aside, audio, b, base, bdo, bdi, big, blockquote, body, br, button, canvas, caption, cite, code, col, colgroup, command, datalist, dd, del, details, dfn, div, dl, dt, em, embed, fieldset, figcaption, figure, footer, form, h1, h2, h3, h4, h5, h6, head, header, hgroup, hr, html, i, img, input, ins, kbd, keygen, label, legend, li, link, map, mark, meta, meter, nav, noscript, object, ol, optgroup, option, output, p, param, pre, progress, q, rp, rt, ruby, samp, script, section, select, small, source, span, strong, summary, style, sub, sup, table, tbody, td, textarea, tfoot, th, thead, title, time, tr, track, tt, ul, var, video, wbr');

        var html-attribute = $(code).find('abbr, accept-charset, accept, accesskey, actionm, align, alink, alt, archive, axis, background, bgcolor, border, cellpadding, cellspacing, char, charoff, charset, checked, cite, class, classid, clear, code, codebase, codetype, color, cols, colspan, compact, content, coords, data, datetime, declare, defer, dir, disabled, enctype, face, for, frame, frameborder, headers, height, href, hreflang, hspace, http-equiv, id, ismap, label, lang, language, link, longdesc, marginheight, marginwidth, maxlength, media, method, multiple, name, nohref, noresize, noshade, nowrap, object, onblur, onchange,onclick ondblclick onfocus onkeydown, onkeypress, onkeyup, onload, onmousedown, onmousemove, onmouseout, onmouseover, onmouseup, onreset, onselect, onsubmit, onunload, profile, prompt, readonly, rel, rev, rows, rowspan, rules, scheme, scope, scrolling, selected, shape, size, span, src, standby, start, style, summary, tabindex, target, text, title, type, usemap, valign, value, valuetype, version, vlink, vspace, width');

        var html-value = $(code).find(/* Any instance of text inbetween two parenthesis */);

        $(element).wrap('<span class="element" />');
        $(attribute).wrap('<span class="attribute" />');
        $(value).wrap('<span class="value" />');

        $(code).find('<').replaceWith('&lt');
        $(code).find('>').replaceWith('&gt');
    });
});

The second way I thought of was to detect elements as any amount of text surrounded by two < >'s, then detect attributes as text inside of an element that is either surrounded by two spaces or has an = immediately after it.

$(document).ready(function() {

    $('pre.prettyprint.html').each(function() {

        $(this).css('white-space','pre-line');

        var code = $(this).html();

        var html-element = $(code).find(/* Any instance of text inbeween two < > */);

        var html-attribute = $(code).find(/* Any instance of text inside an element that has a = immeadiatly afterwards or has spaces on either side */);

        var html-value = $(code).find(/* Any instance of text inbetween two parenthesis */);

        $(element).wrap('<span class="element" />');
        $(attribute).wrap('<span class="attribute" />');
        $(value).wrap('<span class="value" />');

        $(code).find('<').replaceWith('&lt');
        $(code).find('>').replaceWith('&gt');
    });
});

How would either of these be coded, if at all possible

Again you can see this as a jsfiddle here: http://jsfiddle.net/JamesKyle/L4b8b/

Simple HTML Pretty Print

Answer

Related questions