Murali Kaundinya

Subscribe to Murali Kaundinya: eMailAlertsEmail Alerts
Get Murali Kaundinya: homepageHomepage mobileMobile rssRSS facebookFacebook twitterTwitter linkedinLinkedIn

Related Topics: CMS Journal, Java Developer Magazine, Open Web Magazine

CMS: Article

Enterprise Content Mangement on the Java Platform

A peek into Java standard APIs for accessing a content repository

Java Web applications have needed a standards-based API for Enterprise Content Management (ECM) for a long time. ECM is an essential requirement for Web applications on the Internet, intranets, and extranets. ECM vendors have proprietary APIs in various languages and this fact has inhibited ECM architectures from being interoperable. JSR-170 for ECM defines a new set of APIs to standardize the interface with ECM products. It aims to the make the ECM product pluggable, much like the JDBC the API enables application code to be independent of databases products. JSR-170 has been actively supported by several ECM vendors and approved for public review. Its adoption is predicated on enterprises demanding it from ECM vendors, and it remains to be seen if these vendors will forego their unfair advantage.

In this article, we explain the lifecycle management services associated with "Content" to a Java developer building enterprise applications focusing on the new and emerging JSR-170. Enterprise Content Management (ECM) is about managing the lifecycle of "Content" in an enterprise. The lifecycle of managing such content requires a robust architecture. The lifecycle of "content," as shown in Figure 1, begins with its getting authored with some metadata. It's formally represented in digital form and uploaded to some server using Web protocols. It's then processed, which typically consists of sorting, classifying, and storing in a form that's subsequently easy to query and search. Content gets served to an authenticated and authorized user either in isolation or merged and aggregated with other content. Not all users are interested in the same kind of content so content has to be customized to suit individual user preferences, display device characteristics, and local and internationalization requirements. In Figure 1, users from various domains want to access content through various devices. The same content has to be rendered on various client devices. Compounding that, with the innumerous types of content and associated standards, a single general-purpose Content Management System (CMS) is seldom sufficient for an enterprise. Enterprise architectures deploy more than one ECM product, each having its own APIs for lifecycle management services, which increases developer complexity. A unified and standardized API such as JSR-170 can simplify the task of managing content across various vendor products and frameworks.

Enterprise Content Management
The ECM lifecycle comprises of a set of independent tasks, all of which are part of a large workflow. The workflow begins by identifying the roles required for ECM lifecycle tasks and assigning users to those roles. Typical roles are content creators, reviewers, translators, classifiers, approvers, deployers, managers, etc. Once users, roles, and groups are identified, the data must be provisioned in the enterprise identity management repository. The next step involves content creation. Content creation gets done through tools that vary by content type. The creation exercise has localization and internationalization requirements. In an automated system, content can also be aggregated from multiple sources. Both the manual creation tasks and the automated tasks have to consider requirements for the transport protocol. Once the content gets submitted to a server, it has to get versioned. All content has to have metadata that describes the content's characteristics. It can be defined either at the time of creation by the content author, or it can be extracted at the time of classification, or it can be both.

Once submitted content often has to be translated for an international audience. It may also have to be transformed based on visual formatting requirements specified in templates and other style sheets.

Once the content is transformed, it has to be assigned to the appropriate placeholders for dynamic rendering. The scope of ECM can also extend to content delivery. Delivery involves the assembly of dynamic content. It also requires the construction of an index and the ability to search the site for all of its content. There may be personalization requirements and consumers may have a preference about how they want the content to be structured. Consumers may have various authorization privileges based on their roles. Many of these content delivery requirements are also applicable to portal architectures. Portals use ECM solutions as a back-end service. The scope of this article is restricted to the Java interfaces dealing with content repositories.

Java Content Repository Model
The purpose of this API is to provide a standard implementation-independent way to access content bi-directionally on a granular level in a content repository. The challenge is to allow enough flexibility in the API so it can be used for hierarchical (path-based addressing) and non-hierarchical (UUID-based addressing) repository models. The APIs should be easy to use from the programmer's point-of-view and at the same time its core focus should be to interface with a repository and not venture into areas that might be regarded as "content applications." ECM products have some common base features and they distinguish themselves with some unique features. One of the objectives of the API is to make it easy to implement on top of existing content repositories. The other objective is to standardize some of the more complex functionality needed by advanced content-related applications. To accelerate the adoption of this standard interface by ECM vendors, JSR-170 has taken a multi-step approach to the implementation of the APIs. Level 1 of the API defines a set of basic repository operations such as Read, Update, and Delete functions, the assigning of types to content items, serialization and search. Level 2 defines some of the advanced repository functions that are needed such as advanced content management like supporting transactions, versioning, access permissions, locking, and hard links between content items.

The repository as exposed through level 1 of the JCR is a tree structure very much like the Unix file system. It comprises nodes that can have zero or many child nodes. It should also support CRUD (Create/Read/Update/Delete) operations on the nodes and provide for assigning node types and the means to search the repository. Nodes can have zero or more child properties. It should be possible to do retrieval and traversal of nodes and properties. A path-syntax has been defined to navigate the tree. The repository has three layers of isolation. javax.jcr.repository is an interface. An object implementing this interface represents a persistent data-store. javax.jxr.workspace is an interface; objects implementing this interface serve as a private view whose activities are only visible to users in this workspace. Changes made to this view have to be committed with an explicit checkin operation. A third type of isolation is between the workspace and the nodes (objects) in memory. A repository is similar to the well-known concurrent versioning system but there are some subtle differences. JCR doesn't distinguish, and rightfully so, between content and its metadata. It's up to the application to define its preferred conventions. JCR can be implemented on top of a file system, WebDAV, RDBMS. etc. Figure 2 shows a high-level JCR architecture.

An ECM application that's protected through JAAS retrieves a handle to a JCR Repository object using Java's Naming and Directory Interface. It populates a Credential object by pulling attributes from JAAS and invokes the Repository object's login method. So it retrieves a ticket that's like a session. Using the ticket, it retrieves one or more workspaces. The workspace provides for APIs to navigate the node tree and modify the nodes and their properties. JCR provides APIs to copy and move nodes around. It also lets APIs import and export nodes to external systems. A node can be serialized in an XML document. Likewise, an XML document compliant to some schema can be imported and attached to an existing tree. In a nutshell, JCR is similar to a Java DOM (Document Object Model) API with an ECM-friendly syntax.

As we said before, the motivation for having two levels of API for this JSR is so this complex set of APIs can be adopted by the industry in a phased way. A JCR repository is viewed as a collection of workspaces, each of which organizes the information in it in a graph (or tree) structure shown in Figure 3. Level 1 of the API defines a standard way to acquire a handle to a workspace in a repository, to authenticate to the workspace, and to access or manipulate data in a workspace at the content-element level.

More Stories By Murali Kaundinya

Murali Kaundinya is a Senior Enterprise Architect with Sun Software Services. At Sun, he helps customers with strategy for enterprise software architecture and delivers standards-based solutions.

More Stories By Sunil Mathew

Sunil Mathew currently manages the Enterprise Web Services Practice for Sun Microsystems. Previously, Sunil has been a senior Java architect with Sun's Java Center group and he has led the Technology Solutions Architecture group for Sun in the northeast. He has over 15 years experience in Information Technology and is a coauthor of Java Web Services Architecture, Morgan Kauffman Publishers.

Comments (1) View Comments

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.

Most Recent Comments
Boris Kraft 04/08/05 05:10:43 AM EDT

Great introduction - only missing piece: there is already an open source implementation of a Content Management System based on JSR-170: Magnolia (more: )

Its been in the works for 2 years now and with more than 60'000 downloads since its release has seen a rapid industry adoption rate. Hope you like it, too!

- Boris