Monday, 18 January 2016

On Sources...

In genealogy we deal with a large number of documents, normally referred to as "sources". This is where we find the data that drives our research and good genealogists will be careful to study their sources to determine the veracity and accuracy of the data contained in them. Most genealogy software treats an entry in an online database (say, Ancestry, FamilySearch, The Genealogist, etc) as a single source and as a result you can end up with a large number of "sources" that really refer back to the original document. If you use multiple sites, you can end up with the same data, from the same original document being treated as five or six distinct sources which can tend to skew the results when you are trying to decide what the "correct" data should be for a particular fact. Then you have the added wrinkle of copies of original documents, such as the Bishop's Transcripts of original parish registers.

Take for example the Stretford, St Matthew parish registers. These parish registers are (mostly) digitised and images are available on many of the major online databases. The registers have also been transcribed, several times, and indexes are also widely available. In some cases, there are two or three separate transcriptions on FamilySearch (without images) and one or two transcriptions of the corresponding Bishops Transcripts on FamilySearch as well. This means when searching for the baptism of a particular person, you can return up to five different records on FamilySearch. The major databases, such as Ancestry, have (with permission) copied some of FamilySearch's indexes of these registers as well as conducting their own digitisation and transcription programs. The net result is now there are potentially 10-12 records for the one event on just two sites, all originating from two actual documents, one being a copy of the other. I you had 10-12 pieces of evidence supporting a fact, you might think that the fact was sound and your work was done, but is it really?

All these "sources" derive from one original document, being the parish register. The Bishops Transcript is simply the first copy of this document, then the information has been transcribed multiple times from photos/scans of these two physical documents. At most you have two source documents, but really there is only one true source.

What if software treated all these transcriptions and indexes as versions of the one original? When you look at how many supporting sources you have for a fact the software would tell you that you have one (or two) sources and 10-12 transcriptions. If the transcriptions vary (and they often do) the software could present all the variations and allow you to choose a preferred version - all the different transcriptions are doing is revising consensus on the transcription of the one source document.When determining the correct details for a fact (say date of birth or place of residence) truly distinct sources can then be presented without the "fog" of multiple transcriptions allowing the user to have more confidence that they have correctly interpreted the data.

That's one of the things I am trying to do with my own software. I want to move to a more source-centric process but take it a step or two further than other source-centric software. The data structures are slowly coalescing, but it hasn't been easy. I need to be able to link multiple transcriptions to original documents and multiple data repositories (database providers, archives, libraries, etc) and then try to distill the facts from the cloud of transcribed versions. I think I am close to understanding where I want to get to with all this, if only I had the time to maple meant it all. ;^)