Microdata, RDFa or JSON-LD Appropriate or best usage?

Grzegorz picture Grzegorz · Nov 13, 2014 · Viewed 28k times · Source

I have been wondering which of those formats is "best"? Schema.org, Microdata, and RDFa are bit of a pain to implement. They can break validation and require quite an effort to put into documents.

JSON-LD is, at last for me, a way better to implement structured data. But does it work? What level of support is there for it (at least by Google)?

Answer

unor picture unor · Nov 13, 2014

Schema.org is a vocabulary that can, like any other vocabulary, be used in many forms. The website http://schema.org/ has examples using Microdata and the RDF syntaxes RDFa and JSON-LD, but these are not the only syntaxes it can be used with. You could, for example, use it with any other RDF syntax like Turtle or RDF/XML.

There is no best syntax. They all have advantages and disadvantages. See for example my answer about differences between Microdata and RDFa. Note that you can use different syntaxes (and vocabularies) in the same document.

Now, if you have a specific consumer in mind, you should consult their documentation. However, support of syntaxes comes and goes, and not everything they might support is necessarily documented, and not everything that is documented necessarily works.

In case of Google, you are probably interested in their Rich Snippets. Their documentation about Rich Snippets mentions Microdata, Microformats and RDFa. However, note that not all linked examples use the Schema.org vocabulary, but the older Data-vocabulary.org or Microformats (as you can’t use vocabularies like Schema.org nor Data-vocabulary.org with Microformats). And there are also some Rich Snippets that aren’t listed on that page, like the Sitelinks Search Box, for which they even recommend the JSON-LD syntax.

As general advice: Search engines typically favor visible content over hidden metadata. For example, having keywords as hidden metadata easily allows authors to claim that their documents are about something different than they really are (either because of trying to trick the search engine, or because authors forget to update content in both places). Therefore, uncoupling the metadata from the content, like it’s the case with JSON-LD, could (possibly!) lead to the same issues current search engines have with hidden metadata. (If or which search engines actually handle it like that is a question which is off-topic on Stack Overflow.)

Another possible advantage for coupling the metadata with the content (for example, with RDFa), is that you could easily and automatically generate the same information in JSON-LD, Turtle etc. because everything’s just RDF. Just parse the RDFa, convert to formats of your preference, and embed (in script) or link (with rel-alternate) it if it makes sense.

But yes, adding RDFa is often more complex than adding a JSON-LD blob, because you have to adapt it to the existing markup. (However, it should not "break validation" unless you’re making mistakes.)