Wednesday, 4 March 2015

Database choice - to SQL or NoSQL?

An interesting thought came to me just now. For most of my professional development career, I have been an SQL adherent. I have worked on some large and small SQL projects on a variety of platforms. I understand and am comfortable with SQL databases and it has become second nature to turn to a SQL database when I need to store any reasonable amount of data. Performance tuning and optimisation are two of my strong suits and over the years I have been called in to help speed up and clean up SQL databases. So it was natural that when I started developing my genealogy tools I turned to an SQL database for my data store. For the genealogy log I am using MySQL for local development and PostgreSQL on the server. But are these the right choices?

I was recently approached by the University of Illinois at Urbana-Champaign (UIUC) to be a Community Teaching Assistant (CTA) for two of their online courses on Coursera, one on Data Mining and the other Cloud Computing Concepts. The cloud computing course has been quite interesting and has given me a better understanding of the theory and internals of NoSQL databases. In some respects, some NoSQL approaches are not too dissimilar to systems that I have worked on in the past - key-value stores, column-based database, and graph databases. It would be an interesting exercise to put some of the theory I have been learning into practice and try using a NoSQL database (or two?) for my genealogy tools. The UIUC cloud computing course has been quite theoretical and is just an introduction to the concepts of cloud computing, but it has whet my appetite.

Would a NoSQL database make sense for a genealogy log? What about a source/evidence management app? What about a full-blown genealogy tool? If I decide to switch to NoSQL, which flavour? Maybe a mix of databases would be best. A key-value store might work well with the log. A column-based database might fit well with storing claims extracted from sources. Linking claims together to provide evidence for events and people could be modelled with a graph based database. A family tree or pedeigree seems like a natural fit for a graph database. So many possibilities!

I know one system I would love to use for these tools - it was a cross between a column-based database and a graph database, but it was big and clunky and sat uneasily on top of a SQL database. We effectively created what we called a "universal data model" in SQL. It was very flexible and powerful, but wasn't exactly the easiest of systems to understand and could be very cumbersome to use. I might have to do a little research into the current crop of NoSQL offerings to see if there is something that might be a close match.

If anyone has any experience with NoSQL systems, feel free to chime in. ;^)

Monday, 2 March 2015

Update and plans

My transcription tasks have taken longer than anticipated, but I have finally completed the two wills I recently received. I now have an idea for a transcription tool that would  (semi-)automatically scroll the original document in a frame/window while you type into a lower frame, with a highlight bar showing the line you are working on. I doubt this would be suitable for a web app, so it would more likely be a desktop app should I decide to move forward with this idea.

I have also been jotting down ideas for the real meat of my genealogy app. I am not yet ready to share details of what I have in mind, but the picture of what I want is solidifying. I find that when I picture a new project in my mind as a completed product it means I am on the right track - sometimes it comes to me in a dream where I am using the app and this is what happened to me a short while ago. I saw my app and how it worked. I want to mock something up tomorrow to make sure it feels right and also so I don't forget how I am currently picturing it.

Once I finish my mock app (which should only take a day or so) I can get back into my research log. I really want to get this finished and ready for testing before I move on to a research plan tool. The research plan and research log tools will work closely together as they are really both sides of the same coin. The sooner I finish these two tools, the sooner I can get started on the larger plan. 8^)