I am revising some XHTML files authored by another party. As part of this effort, I am doing some bulk editing via Linq to XML.
I've just noticed that some of the original source XHTML files contain the "
HTML entity in text nodes within those files. For instance:
<p>Greeting: "Hello, World!"</p>
And that when recovering the XHTML text via XElement.ToString(), the "
entities are being replaced by plain double-quotes:
<p>Greeting: "Hello, World!"</p>
Question: Can anyone tell me what the motivation might have been for the original author to use the "
entities instead of plain double-quotes? Did those entities serve a purpose which I don't fully appreciate? Or, were they truly unnecessary as I suspect?
I do understand that "
would be necessary in certain contexts, such as when there is a need to place a double-quote within an HTML attribute. For instance:
<a href="/images/hello_world.jpg" alt="Greeting: "Hello, World!"">
Greeting</a>
It is impossible, and unnecessary, to know the motivation for using "
in element content, but possible motives include: misunderstanding of HTML rules; use of software that generates such code (probably because its author thought it was “safer”); and misunderstanding of the meaning of "
: many people seem to think it produces “smart quotes” (they apparently never looked at the actual results).
Anyway, there is never any need to use "
in element content in HTML (XHTML or any other HTML version). There is nothing in any HTML specification that would assign any special meaning to the plain character " there.
As the question says, it has its role in attribute values, but even in them, it is mostly simpler to just use single quotes as delimiters if the value contains a double quote, e.g. alt='Greeting: "Hello, World!"'
or, if you are allowed to correct errors in natural language texts, to use proper quotation marks, e.g. alt="Greeting: “Hello, World!”"