August 2018 – Data Mining DNA

Measuring Genealogical Similarity using the Jaccard Index

October 6, 2020August 15, 2018 by Margaret O'Brien

For some of the posts on this blog I’ll be using one way to measure the similarity of two sample sets of data. The statistic is called the Jaccard Index, or the Jaccard Similarity Coefficient. This post is a technical explanation of the calculation itself. The sets of data are the unique ancestral surnames of … Read more

The Top 10 Ancestral Surnames across my Ancestry DNA Matches Surprised Me

December 1, 2021August 14, 2018 by Margaret O'Brien

In May 2018 I downloaded the direct line ancestral surnames of all my DNA matches at Ancestry. I discussed some statistical analysis on the numbers in a previous post. I then conducted some exploratory data mining of the distribution of surnames across my matches. My paternal line is African (not African-American), and I have very … Read more