CouchDB could be a viable alternative to relational databases for storing patient data

A while back I wrote the Data Models in Healthcare series of articles. Beyond relational databases, which is of course my primary storage platform, one of my favorite techniques for managing the structured and semi-structure databases is to use XML. XML is a great persistence model for storage of schema-free data (and sometimes schema-fixed data).

However, XML is not a natural model to do aggregation, consolidation, and analytics from various data sources (which is sometimes a pretty big requirement in modern health IT apps) so it’s something to be avoided in certain circumstances. There are also some modern XML native databases (both open source and commercial) that support querying and aggregation but they are neither popular nor ubiquitous which means tooling support is limited. XML native databases have not really taken off but many existing databases (like Oracle, DB2, and others) are supporting schema-free XML artifacts in their databases. Until now either you went with MUMPs for a flexible database or you went to XML stored in a relational schema.

Until now.

I’ve been playing with Apache’s CouchDB for some time now; while I’m not drinking the kool-aid just yet, I find it a worthy platform for researching whether it can play a major role in modern and flexible healthcare data platforms. This is how the CouchDB principals describe their database:

Apache CouchDB is a distributed, fault-tolerant and schema-free document-oriented database accessible via a RESTful HTTP/JSON API. Among other features, it provides robust, incremental replication with bi-directional conflict detection and resolution, and is queryable and indexable using a table-oriented view engine with JavaScript acting as the default view definition language.

CouchDB is written in Erlang, but can be easily accessed from any environment that provides means to make HTTP requests. There are a multitude of third-party client libraries that make this even easier for a variety of programming languages and environments.

I love the fact that it’s written in Erlang (since it was created for concurrent programming) but what I like even more is that it’s native application programming interface (API) is HTTP and REST. This is a modern and refreshing design and is modeled after something that is hugely successful: the web.

I have a basic patient management system up and running in CouchDB and I plan on open sourcing it to let others play with it as well. I really like schema-free nature of the CouchDB because almost all systems start by wanting to store certain parts of patient data only to grow other data collection requirements like weeds. CouchDB doesn’t need constant adjustment. Also, I like that it’s document-centric because most patient record systems require documents to be stored natively (and later edited directly). CouchDB gets around some of the limitations I’ve seen with XML databases while still storing XML as documents.

So, is CouchDB ready for prime-time in an important field like healthcare? Not quite, but within a year I see it can be taking over many responsibilities that we need for flexible data storage in our applications. So, it’s a good time to look at it and maybe even participate in its evolution so that those of us who need some additional functionality for fault-tolerant applications can get it baked into the design.

If you’re interested in other modern mega-scale databases check out Dynamo, BigTable, Cassandra, BerkeleyDB (BDB), Voldemort, and dynomite.

Check CouchDB out and let me know what you guys think about using it or any others of these kinds of modern non-relational databases to handle healthcare data.

Newsletter Sign Up


14 thoughts on “CouchDB could be a viable alternative to relational databases for storing patient data

  1. Pingback: Topics about Health, Food and Well being » Archive » CouchDB could be a viable alternative to relational databases for b…/b

  2. I want you to get the business users to understand the XML concepts enough to be able to generate usable business requirements.

    From my perspective, (employed by a large health insurance company) dumping that sort of thing off on the developer just means that you’re going to write the application 3 or 4 times, and you’re never going to meet your project schedule.

  3. Pingback: CouchDB could be a viable alternative to relational databases for storing patient data | No Brainer Profits

  4. As cloud computing becomes more main stream, there is a growing discussion that compare traditional relational databases to key/value based systems. Check out the great Read Write Web article referenced in ‘Is the Relational Database Doomed?’ (http://developers.slashdot.org/article.pl?sid=09/02/13/2026227).

    “The conclusion suggests that relational databases and key value stores aren’t really mutually exclusive and instead are different tools for different requirements.”

    The same can be said for any other format, including XML — each is just a different tool that will continue to have its place based on need. The requirements for healthcare data are complex, but just as relational databases and XML are currently meeting many of our needs, we will also be able to take advantage of emerging technologies as they mature.

    In the same vein, I’ve started to look more generally at cloud computing development (http://rdn-consulting.com/blog/2009/02/07/exploring-cloud-computing-development). The possibilities are exciting, but as you’ve noted with CouchDB, it’s not clear when they’ll be ready for prime-time.

  5. shahid,

    this is very informative and timely. we have used a key-value based schema for our application database which has made it very easy to extend the same. plus we have used REST to support querying over the web. we are now toying with the idea of creating and releasing the entire data model and querying framework as an open-source toolkit called Moksha (meaning liberation in sanskrit – aka freeing data from the confines of rigid models and increasing its liquidity).
    eager to see your work with couchDB. will keep you posted.

  6. Very interesting post, thanks!

    CouchDB may be a little young for mission-critical deployment for key-value storage, but BerkeleyDB (BDB) is ready for prime time. It’s been available and open source for many years, and has appropriate APIs for all kinds of application frameworks. Oracle Corp. snapped it up recently, but that didn’t affect its open source availability.

    A massively scalable key-value framework based on BDB has been released as open source by LinkedIn.com; it’s called Project-Voldemort.com. It probably can handle large scale health-care records; not sure about the security aspects.

    This whole area really needs a solid open-source community!

  7. Great point, Ollie. I’ve used BDB so many times and it’s such a part of my daily routine that I didn’t even consider others might not know about it.

    I’ve looked at Voldemort and other projects and they are quite intriguing as well.

    While you’re right that CouchDB is not quite ready for prime-time, what I like specifically about CouchDB is that it’s got built-in distribution and replication with standard database-type queries and ACID facilities. That would make data integration in RHIOs or collaborative clinical groupware quite easy without resorting to HL7 and more complicated tasks.

  8. I do find this rather ironic. On the one hand you rightly mention the flexibility afforded by the Mumps technology that is incumbent in today’s most successful healthcare IT system, and then you suggest it’s replaced by….well, something very similar (a hierarchical schemaless database) that has no track record, no benefits over what’s already easily achievable with Mumps, and… no security worth a light!

    The fact is that the neo-hierarchical databases such as CouchDB are reinventing a wheel that Mumps has established many years ago.

    See http://www.outoftheslipstream.com/node/125 and http://www.slideshare.net/george.james/mumps-the-internet-scale-database-presentation

    Here’s a better idea: stick with what has demonstrably worked. If people understood this, perhaps we wouldn’t be seeing stories such as this where “experts” have encouraged the spending of vast amounts of tax payers money to replace that old Mumps stuff with something “more modern”: http://www.computerweekly.com/Articles/2009/01/27/234448/public-accounts-committee-criticises-npfit.htm

    Oh and check out M/DB (http://www.mgateway.com/mdb.html) which demonstrates that a secure, high performance and highly scalable REST-based schemaless, hierarchical database (in this case a SimpleDB clone) can be implemented very easily using Mumps.

  9. I’ve also starting design work on an open source patient management system (I’ve built a closed system before using php/xml/tomcat/hibernate/postgres). I would be interested to know of any progress. From the little I know of Mumps, it looks to heavy for what the lightweight application that I’m designing. I was going to use CouchDB and either PHP/CI or Ruby/Merb. I’m not comfortable programing in Erlang (yet).

  10. I suspect Couchdb is ready but we are less so. The proliferation of schemas in health information systems and constant change should convince us that RDBs are a hammer and everything looks like a nail.

    On Ubuntu
    apt-get install couchdb
    change the bind address to the machines local ip and you are up and running.
    You will need a wrapper for security.
    Do it again and replication will provide backup.

    Isomorphic with S3 so you have a place in the cloud if you want one.

  11. Seems to me that a lot of systems could use your expertise. When it comes to the healthcare system I find that many technologies used for databases and uniformity are simply out dated. I will admit I like your thoughts on this subject, seems to me like you may be on to something.

Add Comment