PHP SimpleXML doesn't preserve line breaks in XML attributes

Joshua picture Joshua · Sep 22, 2009 · Viewed 16.1k times · Source

I have to parse externally provided XML that has attributes with line breaks in them. Using SimpleXML, the line breaks seem to be lost. According to another stackoverflow question, line breaks should be valid (even though far less than ideal!) for XML.

Why are they lost? [edit] And how can I preserve them? [/edit]

Here is a demo file script (note that when the line breaks are not in an attribute they are preserved).

PHP File with embedded XML

$xml = <<<XML
<?xml version="1.0" encoding="utf-8"?>
<Rows>
    <data Title='Data Title' Remarks='First line of the row.
Followed by the second line.
Even a third!' />
    <data Title='Full Title' Remarks='None really'>First line of the row.
Followed by the second line.
Even a third!</data>
</Rows>
XML;

$xml = new SimpleXMLElement( $xml );
print '<pre>'; print_r($xml); print '</pre>';

Output from print_r

SimpleXMLElement Object
(
    [data] => Array
        (
            [0] => SimpleXMLElement Object
                (
                    [@attributes] => Array
                        (
                            [Title] => Data Title
                            [Remarks] => First line of the row. Followed by the second line. Even a third!
                        )

                )

            [1] => First line of the row.
Followed by the second line.
Even a third!
        )

)

Answer

bobince picture bobince · Sep 22, 2009

Using SimpleXML, the line breaks seem to be lost.

Yes, that is expected... in fact it is required of any conformant XML parser that newlines in attribute values represent simple spaces. See attribute value normalisation in the XML spec.

If there was supposed to be a real newline character in the attribute value, the XML should have included a &#10; character reference instead of a raw newline.