Wednesday, 4 March 2015

Database choice - to SQL or NoSQL?

An interesting thought came to me just now. For most of my professional development career, I have been an SQL adherent. I have worked on some large and small SQL projects on a variety of platforms. I understand and am comfortable with SQL databases and it has become second nature to turn to a SQL database when I need to store any reasonable amount of data. Performance tuning and optimisation are two of my strong suits and over the years I have been called in to help speed up and clean up SQL databases. So it was natural that when I started developing my genealogy tools I turned to an SQL database for my data store. For the genealogy log I am using MySQL for local development and PostgreSQL on the server. But are these the right choices?

I was recently approached by the University of Illinois at Urbana-Champaign (UIUC) to be a Community Teaching Assistant (CTA) for two of their online courses on Coursera, one on Data Mining and the other Cloud Computing Concepts. The cloud computing course has been quite interesting and has given me a better understanding of the theory and internals of NoSQL databases. In some respects, some NoSQL approaches are not too dissimilar to systems that I have worked on in the past - key-value stores, column-based database, and graph databases. It would be an interesting exercise to put some of the theory I have been learning into practice and try using a NoSQL database (or two?) for my genealogy tools. The UIUC cloud computing course has been quite theoretical and is just an introduction to the concepts of cloud computing, but it has whet my appetite.

Would a NoSQL database make sense for a genealogy log? What about a source/evidence management app? What about a full-blown genealogy tool? If I decide to switch to NoSQL, which flavour? Maybe a mix of databases would be best. A key-value store might work well with the log. A column-based database might fit well with storing claims extracted from sources. Linking claims together to provide evidence for events and people could be modelled with a graph based database. A family tree or pedeigree seems like a natural fit for a graph database. So many possibilities!

I know one system I would love to use for these tools - it was a cross between a column-based database and a graph database, but it was big and clunky and sat uneasily on top of a SQL database. We effectively created what we called a "universal data model" in SQL. It was very flexible and powerful, but wasn't exactly the easiest of systems to understand and could be very cumbersome to use. I might have to do a little research into the current crop of NoSQL offerings to see if there is something that might be a close match.

If anyone has any experience with NoSQL systems, feel free to chime in. ;^)

No comments:

Post a Comment