In a post entitled "Column Stores". Bruno Dumon considers switching database platforms for the
cocoon-based Daisy CMS repository (which we have used for several years now at scrutable).
http://brunodumon.wordpress.com/2007/10/07/column-stores/
He mentions some different XQuery offerings such as MonetDB (which looks cool), and there are
several comments from Bruno's readers suggesting other XQuery implementations.
But, regardless of how great MonetDB or Sedna or other open source XQuery implementations
are, I'm pretty sure they're not integrated with cocoon out of the box.
As someone who uses Daisy, cocoon, and eXist, it seems to me that the best Daisy repository
forward migration is onto the (somewhat unfortunately named, google-wise)
eXist DB
Why? because
1) Both Daisy and eXist use Apache Cocoon (which of course is pure java).
2) One of eXist's strengths is its GUI remote-admin client, which allows us to easily manage and query
documents and collections in what I think is a highly daisy-compatible way. (It allows me to manage
and query documents in my home-grown apps better than in my CMS - daisy - and I want to fix that!).
3) eXist offers a multitude of other interfaces (java API, REST, XMLRPC, SOAP, WebDAV, REST).
So, I see a golden opportunity for a clean, lightweight integration of the two apps. I recognize that
Daisy currently uses a lot of other infrastructure besides cocoon, such as messaging and BPM stuff in
ActiveMQ, but from practical experience I think that at least the option of a lighter Daisy deployment
would be a good thing. Personally, I see the BPM/workflow as an almost entirely a separate app from
the CMS, which should only need to be turned on for "enterprise daisy", but others may see things
differently.
Also, there is a whole discussion here to be had around the relationship between metadata and
content, XML and RDF and SQL, etc., but I don't want to get bogged down in that just now.
My point is that I would love to be able to run daisy-lite and eXist in a single JVM, configured using
little more than the two cocoon sitemaps, probably deployed inside the same cocoon instance
(eventually, I suppose, as cocoon "blocks", which the eXist project already delivers).
Any required enterprise messaging/BPM infrastructure could be bolted on to the same JVM, or run in
a separate process (ActiveMQ, Mule, whatever). Application components shouldn't care what container
they are running in - SOA basics, right?
(Details of how I would approach this migration task are behind the cut).
If achieved, then in the simple let's-get-started use-case for Daisy+eXist, there's
only one OS process to (re)start to run both projects (the servlet container under which
the cocoon apps run). This single process yields a pure-java, XQuery-compliant content
management system that provides interfaces for remote editing (through Daisy Wiki),
remote management (through eXist java admin client), and remote query (through XQuery
on eXist + any chosen bonus metadata storage/query infrastructure). And hey, eXist also
supports WebDAV, which would be a nice way to move Daisy documents around, too.
IJust as important as all that out-of-the-box functionality, that hypothetical CMS+repository
is straightforward to extend and integrate in the web tier, using cocoon.
Finally, the uniting of the Daisy and eXist apps as closer cousins in the cocoon family would, I think,
strengthen the cocoon platform and help to justify increased investment and sponsorship for all three
projects. Thoughts?
Besides document XML content (and binary resources, which eXist handles), Daisy still needs a place to
store & query metadata, for which the options are boundless.
Of course, I favor RDF-enabled approaches. RDF metadata can be managed in SQL,
in RDF/XML (which could be stored in eXist, of course), or using any other RDF serialization
or repository (maybe put your BIG metadata sets in a triplestore like Mulgara, and your
tiny idiosyncratic metadata sets in N3 text files edited by hand).
So, initially I would leave all the metadata in SQL , for now, and focus on
migrating XML content into eXist. Basic collection and document C.R.U.D. integration, can be
implemented in any combo of REST, XQuery and the eXist/XML:DB java client API (which
is implemented over XML-RPC). The java API can be used in both local and remote
DB configurations, ain't that nice?
Ad hoc querying, including free-text queries can be done using any of the multitude of query interfaces supported by eXist.
Deployment + Testing Tactics: Daisy and eXist can at first be run in separate, communicating JVMs
(leveraging the fact that Daisy is implemented so rigorously as a multi-tiered enterprise app - kudos for
that! Let's use it!) and the initial communication can, as I mentioned, be done using a combo of REST
(very cocoon-sitemap friendly!) and the eXist java API (which again, runs in both "remote" and "local" modes).
So, we could finish the integration work while running
Daisy and eXist separately, then opportunistically unite the VMs later when it's convenient.
Ideally, Daisy would become even more componentized in the process of this migration,
and it would become possible to quickly run a "Daisy Lite" with no dependencies on external
processes running outside the servlet container.
Pow!
Comments