Measuring Genealogical Similarity using the Jaccard Index

For some of the posts on this blog I’ll be using one way to measure the similarity of two sample sets of data. The statistic is called the Jaccard Index, or the Jaccard Similarity Coefficient. This post is a technical explanation of the calculation itself. The sets of data are the unique ancestral surnames of … Read more Measuring Genealogical Similarity using the Jaccard Index

A brief technical overview of data mining dna matches on Ancestry

The purpose of this blog is to present some analysis of the available data from several DNA testing companies for one or more DNA kits. This post is a high-level description of how the data is retrieved and analysed from AncestryDNA. AncestryDNA presents “Match List Pages” with each match as a row displaying the user … Read more A brief technical overview of data mining dna matches on Ancestry