When to use JCR (content repository) over other options?

Ither picture Ither · Oct 11, 2010 · Viewed 23.5k times · Source

I'm trying to evaluate content repositories (JSR283) like Jackrabbit and ModeShape but I must confess that I don't understand what problem resolves in first place and even if it is a good choice for the project. Which cases do you think is the best solution to apply? Is not the same thing as relational databases, except for the size? Why? Extra points for pointing real world examples.

Thanks in advance.

Answer

Randall Hauch picture Randall Hauch · Oct 11, 2010

JCR repositories are different than RDBMSes, because a JCR repository:

  • is hierarchical, allowing your to organize your content in a structure that closely matches your needs and where related information is often stored close together and thus easily navigated
  • is flexible, allowing the content to adapt and evolve, using a node type system that can be completely "schemaless" to full-on restrictive (e.g., like a relational database)
  • uses a standard Java API (e.g., javax.jcr)
  • abstracts where the information is really stored: many JCR implementations can store content in a variety of relational databases and other stores, some can expose non-JCR stores through the JCR API, and some can federate multiple stores into a single, virtual repository.
  • supports queries and full-text search out of the box
  • supports events, locking, versioning, and other features

You certainly can build all or some of these features in your own application, but that likely gets further away from what your main purpose of your app.

What kind of applications can benefit from these features? Content management systems have used repositories for a long time, and JCR (and Jackrabbit) really grew out of the need for a common, standard API to access different content repositories (see JSR-170 and JSR-283).

Another example are document management systems, which manage electronic files (that are often images of paper documents) and provide search and query. DMSes have used repositories for some time.

Artifact management systems can use repositories to manage digital artifacts (often files) along with additional information (metadata). JCR works great here, because you can store the metadata in the same location as the files: those that understand these extra properties can see them, those that don't care don't have to see them. I know Artifactory is a Maven repository implementation that uses JCR. There are also repositories for managing web service artifacts, data service artifacts, and test artifacts.

But JCR repositories are not for managing files. JCR uses a simple notion of a hierarchy of nodes, where the nodes can contain named properties (with one or multiple values) and children. The properties and child node that are allowed are dictated entirely by node types, which can be changed and mixed in as needed on a node-by-node basis. JCR predefines some built-in node types that are commonly needed, like those used to represent files and folders in the repository. You can reuse these built-in types, extend them, or write your own. Many people advocate using mixins almost as facets or aspects, so that if a node needs to take on a facet you can simply add a mixin to the node.

JCR was designed to easily support importing XML content into the repository, where each element is mapped to a node and each attribute is mapped to an attribute. And lots of stuff is represented using XML (or YAML or JSON), and all of this can easily be represented and stored in a JCR repository. As an example, consider a JCR repository that stores configuration information (that might normally be stored in multiple XML files). JCR can version that information, allow access to it from multiple processes, enable querying and search, and notify the application(s) when content changes.

There are several good overviews of JCR with more detail and examples. A few of these are: