Saturday, 9 May 2015

On GEDCOM, its uses and abuses...

Louis Kessler recently posted about an interesting exchange about GEDCOM's handling of sources on the FHISO mailing list. Louis raises some interesting points and shows that GEDCOM is quite a detailed and complex standard which can cope with a wide range of situations, if only the developers of genealogy apps would just study the standard. Sure, it is not an easy standard to comprehend and it does not directy cater for all possible scenarios, but it can definitely cope with a lot more than developers give it credit for.

Personally, I am not a huge fan of GEDCOM - not for what it can/can't do, but for how it has been used and abused by some sloppy developers. There have been many blog posts and articles about the incompatabilities between different implementations of GEDCOM imports and exports in different software packages. I have personally used genealogy softare which couldn't even correctly import GEDCOM created by itself! Surely if a developer has gone to the trouble of creating GEDCOM export functionality, then it is not too much to expect that they have fully tested that the software can import its own GEDCOM files.

So why is there this problem? There could be many reasons - sloppiness, laziness, rushed implementations, not referring back to the specifiction documents - but one of the main reasons is that we all let the developers get away with these half-hearted implementations. IF users (and other developers) keep accepting the poor GEDCOM support, then we will keep getting poor GEDCOM support. I have voiced my opinion to the developers of the two main genealogy programs I use on my Mac and I have noticed some (minor) improvements over the years.

There is something about this that reminds me of the early days of web development when we had the infamous "browser wars" between Microsoft's Internet Explorer and Netscape's Navigator. If you are old enough to remember those days, it was not uncommon to see a "Optimised for Internet Explorer" or "Optimised for Netscape Navigator" badges on websites. Both companies were in a mad rush to stuff so many new features in their browsers that they didn't bother spending the time to ensure their browsers supported the relevant internet standards (HTML, CSS, etc) and the result was that both browsers supported different parts of the standards and the implementations were incompatible. Some of the incompatabilities were caused by cutting edge technology not covered by any existing standards, some were caused by mis-interpreting the standards and some were deliberately introduced to break the other browser.

The browser wars were a very trying time for web developers. If you could control the platform (for example, a company intranet) it was easy, you chose a browser (IE or Netscape) and you could code to that browser's quirks. If you didn't have control over the platform, your code was a mess of conditionals, browser sniffing, and CSS hacks as you tried to serve up the "correct" version of your page depending on the user's browser. Even when you did have control it wasn't always easy. As browsers got upgraded you had to make sure your code still worked in case the browser vendor fixed some of their incompatabilities.

Coding to the published standards wasn't really a solution. Both vendors picked and chose which arts of the standards they would implement so developers had to refer to tables and charts detailing which parts of the standards were supported by which browser and where thy interpreted the standards differently. The CSS box model was a never ending source of frustration for me - you could get your code working nicely in IE, but it would look terrible in Netscape. Or vice versa.

Over time web developers started to fight back. I attended a number of developer conferences and at every one there was a vocal crowd of web developers demanding to know why the standards were being ignored. Microsoft's response was that they weren't creating browsers for developers, they were creating browsers for end users and they (Microsoft) knew better than we developers which parts of the standards were important enough to implement. More and more developers started to push back, various test suites were created to test the implementation of the standards and push the limits. Showcases of innovative web sites highlighted the standards support issues and some smaller browser developers started building "standards-compliant" browsers and slowly but surely the focus of the browser wars changed from unique features to standards support. It eventually became "cool" for browser implementors to tout their standards compliance and finally standards just became an expected feature of a browser

So how does this relate to GEDCOM? It probably doesn't except insofar as the level of complaints about poor support will determine what, if anything, is done to rectify the situation. GEDCOM is an old standard. A very old standard. It probably needs a revamp to bring it more inline with "modern" genealogical practices, especially when it comes to handling of sources and evidence-based genealogy. There have been several attempts to improve GEDCOM, or to develop a complete replacement, yet none of these efforts has amounted to much of note. To be honest though, it doesn't matter if one group or another comes up with a better GEDCOM unless there is buy in from the major genealogy developers. Small developers can support all the alternative standards they want, but if the big guys don't support it, what's the point?

Inertia is a huge force in software development. Everybody supports GEDCOM because everyone else supports GEDCOM. No one bothers to implement GEDCOM properly, because no one else implements it properly. No one bother to change this status quo because the users (and other developers) aren't demanding it.

For my software I have made the decision to not implement a GEDCOM import. I can get away without a GEDCOM import because I am not writing a "family tree" tool, I am writing a source management and analysis tool. There are enough bad and inconsistent implementations that it wuld really be too much work to try to import badly formed GEDCOM files. I will (eventually) provide a GEDCOM export and I will make darned sure I get it right. (Or at least as right as I can make it.) I will also be exporting my data in a variety of other formats , including exposing a web service or API for other developers to work with.


  1. Came here via Louis Kessler's blog and I agree with your assessment.
    So I thought I'd go find the site you refer to at the top and check out the tools, but I cannot find then, only for very similar place called genea-tools
    Why not provide a link on your blog?

    1. There is no link at present because there is no site to link to. I am still building my genealogy tools and this blog is just the first step. When there is something to show I will make an announcement on this blog. ;^)

  2. Your work on a source-analysis tool still sounds very intriguing. I'll have to check back once in a while.
    All the best wishes for success.