Another new Ancestry feature – CMs on the Match List Page

As of late October 2018, Ancestry have released another new display feature on the Match List page: a row below each match displays the number of shared centimorgans and segments.

This should be a real time saver when reviewing matches by scanning the page. In combination with the other new feature I posted about, the “no tree” display, these are just great user-friendly features that minimizes clicking and waiting for for each individual match page to load.
I would expect it will also reduce load on the Ancestry web servers, as users are less likely to click into their latest 6.0 CM matches. That would be good news for the customers, as it should mean we see less of the “name unavailable” type of errors.

How many of my Ancestry Matches have added or removed Trees in the Last 3 Months?

I conducted a review in May 2018 of all my DNA matches on Ancestry and recorded whether the match had a public or private tree, and whether a tree was linked to their DNA. I ran the same review three months later in August 2018, having seen an additional 15 hundred matches added during that time.

The percentage breakdown of public linked/public unlinked/private/none has not changed in rounded numbers. It remains as 40% of my matches have a public linked tree and 27% have at least one public unlinked tree. 7% have only a private tree. 26% had no tree at all.

I’m relieved that the 26% No Tree has not increased. See my May blog post for a comparison with other Ancestry users who have blogged on their numbers. One comment from Blaine Bettinger mentioned he was interested in tracking how many new matches add a tree after a period of time such as a year.

Well, only three months have passed for me but as I have the data to hand, I’ll take a look at the breakdown: of both new matches and also of older matches who have since added a tree within the last three months.

Some good news for me is:
102 matches had no tree in May 2018 and have since added a public or unlinked tree.
28 matches had a private tree and have since added a public or unlinked tree.
7 matches had no tree at all and have since added a private tree.

Ignoring the private trees, in total about 1.2% of my matches added an available tree after some delay in time.

But I must hold off on breaking out the bubbly! Some of my matches went in the reverse direction.

42 matches had a public linked or unlinked tree and have since gone private, leaving no public tree available.
 4 matches had a public linked or unlinked tree and have gone nuclear i.e. they’ve removed any tree from Ancestry.
 0 matches were private and had gone nuclear – just adding that one in for completeness.

It’s still a positive balance.

But it leaves me with a net of 0.8% of older matches who went from no public tree to delivering a new available tree within the last three months.


I stated that the overall proportions of tree availability hadn’t changed since my prior evaluation of May 2018. That isn’t the case when I evaluate only the matches who were added within the last three months. Below is a side-by-side.

So that 3 month breakdown is of about 1500 matches, or about 14% of my current overall total. I don’t like to see that the “No Tree” category is proportionally larger at 37% versus the overall 26% figure. It made me triple-check the figures, but that’s the current picture.

The Top 10 Ancestral Surnames across my Ancestry DNA Matches Surprised Me

In May 2018 I downloaded the direct line ancestral surnames of all my DNA matches at Ancestry. I discussed some statistical analysis on the numbers in a previous post.

I then conducted some exploratory data mining of the distribution of surnames across my matches. My paternal line is African (not African-American), and I have very few paternal matches on Ancestry. My maternal line is Irish, and I expected the usual suspects of popular Irish surnames to appear in my own top 10 list. I’m talking Murphy, Ryan, Kelly and the like. I could imagine a pattern where my emigrant ancestors landed in the traditional “Irish” enclaves of New York or Boston, and married exclusively within the cohort of people they met from the old country. If their American descendants followed suit, then the distribution of surnames across my matches should skew Irish.

Data mining is about asking questions of the data, so here is the Q and A.

Question: What are the Top 10 Surnames across my Matches?

Before I crunched the numbers, my educated guess for Number 1 was Smith, which can be of Anglo-Saxon origin or of Irish origin. My maternal great-grandfather was a Smith. At least one of his sisters married a Smith. And it seems that I see Smiths everywhere I look among my match trees. Well, I wasn’t wrong on Number One, but the next 9 surprised me. Here is the distribution of the top 10 surnames across my Ancestry matches.

I do not recognize any of the next nine surnames from my known direct lineage.

Question: Is my top 10 Surname Distribution typical of Ireland (north and south)?

So I compared my distribution to a paper published by Sean J Murphy titled “A Survey of Irish Surnames 1992-97” . Murphy presents the top surnames on the entire island of Ireland (i.e. including Northern Ireland) based on data he gathered from 1992-1997. It’s a great read for anyone interested in Irish lineage. Because my Irish ancestors are predominantly Ulster, I’m working with Murphy’s numbers instead of recent data from the Irish Central Statistics Office, as theirs does not include Northern Ireland. Murphy provides raw numbers in tabular format, but I’ve plotted the distribution to allow a broad side-by-side comparison to my own:

Clearly the answer is no, the distribution across my matches doesn’t look similar to the distribution of surnames on the island of Ireland.

Question: Is my top 10 Surname Distribution typical of British surnames in Ireland?

I wouldn’t have thought of this without reading Murphy’s paper which has a section on “British Surnames in Ireland”. I have not established a genealogical link of my ancestors to British origin. But my distribution sure does look more similar to this breakdown than to the Irish one.

Before I draw any inferences, I’ll ask another question.

Question: Is my top 10 Surname Distribution typical of the United States?

I then compared my distribution to the USA census figures. I chose the 2000 census to be similar time frame for the Irish numbers. It might be better to use use earlier census figures: because living people are private, the available names in trees are likely to represent an earlier time frame. But its all very approximate, and I’m only interested in the broad distribution, which is certainly closer to mine than the Irish distribution.

The names “Garcia” and “Rodriguez” jump out at me because I’m not familiar with them. Checking my data, I have a sum total of 17 matches with a direct ancestry to Garcia or Rodriguez. My understanding of American demographics is that the last few decades will have pushed Hispanic surnames upward in frequency. So I narrowed the census numbers to filter on respondents who identified as white European lineage.

So after all that, I can see that the distribution of surnames across my Ancestry matches is closest to the USA white population.

Of course the vast majority of Ancestry customers are American. However if there was a very high tendency among my own emigrant ancestral relatives to keep themselves to themselves and only to marry within the typical Irish communities, then my top 10 distribution would surely be closer to the filter of the top 10 Irish surnames within the USA census. (I’m using the Ancestry blog as my source for these next numbers – they are using the 2000 census but I’m not sure how they got this particular filter).

So I don’t have any of the Irish American top 10 names amongst my top list. To be fair, I only have to go to #13 in my own distribution to hit “Murphy”, and I have “Kelly” at #18. But those two names are the only “Irish-American” names in my top 30. (Smith is the awkward outlier. It’s my personal #1 and I’m sure its being excluded, but it can also be Irish origin).

To Summarize:

Before I did this analysis I assumed that I’d have a higher distribution of “Irish” names across my matches. My Irish emigrant ancestors appear to have avoided a tendency only to marry other Irish descendants, but tended to marry within the local population of European heritage.

Measuring the Usefulness of Trees of my Ancestry DNA Matches

Previously I discussed an analysis of the availability of trees across all my DNA matches. The headline figures from May 2018 were

  • 40% of my matches had a public linked tree
  • 27% had no linked tree but had at least one public unlinked tree
  • 7% had a private tree
  • 26% had no tree

A total of (linked + unlinked) 67% of matches with at least one public tree at first glance would seem to be a very positive outcome.

To analyse usefulness I proceeded to drill down further into the actual content of those trees. Specifically, my May analysis included a download of the Direct Ancestor Surname lists provided by Ancestry on the “Pedigree and Surname” page for each match where a tree is either displayed by default (the linked tree) or has been selected from a list of unlinked trees. I use my own utility but the DNAGedcom Client will also download surname lists to spreadsheet to allow analysis. I don’t know if it grabs unlinked trees, mine does to a limited degree (just one per match).

It was immediately clear that many matches had a tree, but had no available surname list. The reason was simple: the tree was public but all entries had been set to a status of living, so their details were hidden:

Everybody Lives!

This was a significant 11% of my matches with a public linked or unlinked tree, or a rounded 8% of my total matches, as reflected in the revised usefulness proprotion:

Single Surname Trees

Aside from the I-see-no-dead-people trees, the other happiness-killers are the loner trees with but a single visible entry:

I also include in this category those trees with multiple generations of a single surname and no visible spouses. These single surname trees were a very similar (slightly lower) percentage to the “Everybody Lives” category. The usefulness proportion is again reduced:

Variety is the Spice of Life

My grandmother was a Smith, and so it seems were all the neighbours. But research-wise I could have had it worse, like this guy:

Aside from Smith, the two other names I’ve blanked for privacy are common Irish surnames. If you’re thinking Murphy and Reilly, you’re half right. With my current research I find that the most useful trees are those with a good variety of ancestral surnames, certainly more than the three in this example. A high variety will indicate a higher number of ancestral generations represented in the tree.

I titled this post as “The Usefulness of Trees”, but usefulness is in the eye of the beholder. If I was searching for living relatives (e.g. as an adoptee) and this was a close match, this might be gold if the match is prepared to reply to messages.

For now, I will define the potential usefulness of a tree by the number of distinct surnames in the ancestral list. That won’t be the case for everyone, and it may change for me if the focus of my research changes. That being said, I now want to get a measure of how many matches with potentially useful trees that I have.

The highest number of distinct surnames amongst my matches has 272 distinct direct surnames. The tree goes back to the early 1600s and has over 12K people. The rest of my matches are distributed over a range of numbers.

For illustrative purposes, the next figure shows where my matches fall into bands of “number of distinct ancestral surnames”. I’ve already noted the lowest two bands: zero and the single surname crowd.

Above zero or one surnames is the band of low variety: I’ve fairly arbitrarily put this as 2, 3 or 4 distinct surnames. This is where I also think little immediate value is to be had. I’d probably need to build out a research tree, which is why I say there is little *immediate* value to me. It’s about 30% of my public linked/unlinked trees. That eats into my usefulness as so:

That leaves me with about 32% of matches with “useful” trees for my current purposes. For me personally, that’s currently at least 2,870 trees.

How many Irish cousins: the impact of endogamy- Part 4 in Series

In previous posts I described the calculations used by several sources to predict the number of cousins we have at different degrees of cousinship (first, second etc). The calculations assume that our ancestors are not related to each other i.e. no first-cousin marriages or other types of inbreeding. If we have a significant number of ancestor couples who were in fact related, this has a two-fold impact on our own genealogy research.

First, it reduces the total number of cousins that we have. If our grandparents weren’t related then we gain different cousins from both sides of the relationship.

Second, it increases the genetic similarity between ourselves and our cousins. That skews the predictions of cousinship from DNA testing companies i.e. we would share similar amounts of DNA with our second cousins as other people share with their first cousins.

For Irish people trying to use DNA matching to build out their family tree and solve certain mysteries in their heritage, it’s important to know the likely incidence of genetic relationship amongst our recent and distant ancestors.

It’s worth defining a few academic terms used in research in this area.
“Endogamy” is the practice of marrying within a specific social group. It may be due to geographic isolation, or for religious reasons, or a sociological wish amongst a group to preserve traditions. The smaller the population of the group, the more likely that a proportion of couples will be related.

Consanguinity means being descended from the same ancestor as another person. So “consanguinious marriages” are marriages in which the couple are related.

Panmictic means random mating within a population i.e. free from influence of social, geographic or genetic preference. It is another stated assumption in the 23andMe research discussed in my previous post.

Okay, with those terms out of the way, let’s look at academic research which will determine whether the Irish can use with confidence the calculations and predicted figures of numbers of cousins.

Marriage between close family members is strictly illegal in Ireland. Marriage between first cousins is legal, but to be married in a Catholic church the couple must obtain a dispensation from the clerical authorities. If you see mention of a dispensation in ancestral marriage records, be sure to take a closer look.

bishopJ.G. Masterson studied the rate of first-cousin dispensations amongst Catholics in Ireland from 1959-1968. He found a rate of 1 in 720 for the entire island, and 1 in 625 for the Republic. That is less than 0.2%. Masterson also quotes a study from 1883 that simply asked people if they were the children of first cousins. A little less than 0.6% said that they were. I imagine that the falling trend over a hundred years was partly due to decreasing isolation of local populations (i.e. easier to travel).

So we can say in general that first cousin marriages are historically low amongst our Irish ancestors within the last hundred years.

The major exception is amongst a particular section of our population, the Travelling Community. Irish Government figures from 2003 reported that over 20% of marriages within the Travelling Community were between first cousins. This does mean that Travellers will have particular challenges in researching genetic ancestry. There are other endogamous communites taking great interest in similar research who may have useful methods for people from a Travelling background.

In 2017 Irish Travellers only account for about 0.6% of the Irish population. Therefore most Irish people researching their family history can assume a low level of first-cousin ancestral couples for well over a century. More distant consanguinity, such as third cousins, is more likely to be present due to limitations of travel. This will lower the numbers generated by the calculated model, but not with such signficance as experienced by populations with higher rates of endogamy.

I conclude that most Irish people can take as ballpark figures the calculated numbers of cousins by degree, as long as a realistic birth rate is used.

How many Irish cousins: according to AncestryDNA – Part 3 in Series

My previous blog post was on research by scientists at 23andMe on predicting our number of cousins. I applaud 23andMe for the publication of the research in detail.

AncestryDNA have also researched the topic but as far as I can find, they release the information through the marketing department with big headline numbers and not a lot of detail.

Their most recent release of information was to mark World DNA day in April 2018. Unfortunately some news outlets reported that the Irish have 14,000 cousins while others reported Ancestry as saying that we have 14,000 LIVING cousins up to a distance of 8th cousin. The living distinction is important, but I was more disillusioned on realizing this number is up to 8th degree. I’d like to see the breakdown estimates at each degree, as done by 23andMe, but cannot find the details for Ireland.

AncestryDNA did conduct a detailed study using British birth rates and census data to prouct statistics for the average British person, the numbers are shown here. The numbers are lower than in the 23andMe study by Denn et al. The formula used by Denn and Tim X to generate predicted number of cousins is 

If AncestryDNA used the same formula as 23andMe (Denn) and Tim Urban (as discussed in my blog post here), then figuring out the birth rate they utilized is a matter of solving a quadratic equation. I took a shortcut by using symbolab. Plugging in a result of 5 for first cousins and 28 for second cousins, I calculated that if they’d used the formula then their birth rate would have been 2.3. However, solving the same equation of their figures of 3rd and 4th cousins produced different birth rates.

I’m reluctant to repeat numbers for which I can’t explain the provenance, but for the sake of completeness – here are Ancestry’s estimates for British users:

Whatever birth rate and formula they used, the Irish birth rate was significantly higher than Britain prior to 1990 so I’m not particularly interested in the British predictions for the purpose of this post. The problem of course is that Ancestry state that they used census data and other statistics going back 200 years for their calculations, and these records are not generally available for Ireland (no fault of Ancestry here, the sad fact is that Irish records pre-20th century are very patchy).

So it looks like all we have on Ireland from AncestryDNA are their reported calculation of a total of 14,000 1st to 8th living cousins. How does that compare to the predicted totals from my previous posts – which don’t take into account whether these cousins are living or dead? Hard to say, as I can’t find any details as to how AncestryDNA calculated probability of living. There’s a pattern forming here. Even the ISOGG Wiki which have a page on cousin statistics have to cite the Daily Mirror as their source of AncestryDNA information. With all due respect to the Mirror, it’s a tabloid newspaper as opposed to the peer-reviewed journal that published the 23andMe research.

I do hope that AncestryDNA will produce the level of detail as done by 23andMe, but until then, their estimates are a bit of a bust for me. Knowing the assumed birth rate and other assumptions is important to assess whether the figures realistic for Irish users. In my next post in this series I’ll discuss some of the assumptions and caveats to be considered.

How many Irish cousins: according to 23andMe – Part 2 in Series

Yesterday I wrote a blog post based on an entertaining article from Tim Urban to calculate our number of cousins at various distances using birth rate statistics. Today’s post is based on an academic study by researchers from 23andMe (Henn et al) which covers a wealth of complex analysis that includes a table of “expected number of cousins” at degree of cousinship. This table is reproduced in several websites without the detail of the formula that arrived at the very specific numbers.

It took me a few reads to realize that the formula used by Henn et al is exactly the same as what Tim Urban devised. The formula is kinda buried near the end of the article under a section titled “Calculation of expected number of individuals sharing DNA IBD. Yep, that translates to “how many cousins do we have.”

Critically, this section specifies that a birth rate of 2.5 is used to produce the cousin numbers in the table. I’ll come to that later, as it may be more appropriate to the United States than to Irish people over a certain age.

I actually found the academic explanation easier to follow than Tim’s post. So now I’ll try to explain the formula as opposed to using it blindly to produce numbers. The formula is expressed using i as the degree of cousinship and z as birth rate. :

Why 2 to the power of i? For our first cousins, we share two sets of grandparents, therefore two couples. For our secound cousins we share four sets of great grandparents, therefore four couples. And so on up the generations, where the number of couples is equal to 2 to the power of the degrees of cousinship.

Why (z-1)? If one set of grandparents have four children, one of those children is our parent and that person’s children are ourselves and our average of three siblings so we need to remove 1 from the birth rate.

z to the power of i is the total number of non-ancestral cousins that stem from a particular ancestral generation. Thus the number of cousins from one set of ancestors is the product of (z-1) and z^i.  The full total is then achieved by multiplying by the number of ancestral couples for the degree of cousinship.

The table below has the totals for three different fertility rates in recent Irish history. The fourth line uses the fertility rate chosen by Denn et all for their numbers.

As befits a scholarly study, Denn et al also specify the assumptions that they made to simplify calculations. “Perfect survivorship” jumps out for me i.e. that every offsping at every generation lives to produce the average number of children. Given that the user base of 23andMe is predominantly American and with disposable income for personal DNA testing, this might be a reasonable assumption for their purposes. I do think we Irish need to pay some attention to the impact of our mortality rates up to the 1970s.

They also list other assumptions which I’d like to consider. But I’ll save that for another post in this “how many Irish cousins” series. In my next post I want to look at the quoted numbers from AncestryDNA.

How many Irish cousins: according to Tim Urban – Part 1 in Series

Tim Urban wrote an entertaining blog post in 2014 on calculating a ballpark number of cousins based on your country’s average birth statistics. His formula breaks down the totals by degree of cousin i.e. 1st/2nd/3rd/4th and outward.

Tim calculates numbers for USA, UK, Canada and a few other countries, but not Ireland – so I figured I’d crunch the figures for green.

Tim’s formula iswhere “n” is the fertility rate and “d” is the degree of cousin.

The 2013 fertility rates from NationMaster report fertility rates of 1.9 for the U.K, 2.01 for Ireland, and 2.06 for the U.S. Here are the totals for the three countries, with Ireland in the middle line.

Figures for 2013

For now, ignore the eye-watering totals at fifth cousin and beyond. I took one look at the predicted total of 4.1 for First Cousins and rechecked my calculation for a mistake. It just looked suspiciously low to me. My reaction wasn’t based solely on my own family. From the life-cycle of weddings, christenings, and funerals, you tend to have a passing familiarity of the family structures of your friends and neighbours.

A moment’s thought reminded me that Ireland’s birth rate has dropped in recent decades from one of the highest in Europe to closer to the average. The nearest published rate I could find for my birth year was 1970’s rate of 3.87 which predicts 22 first cousins using Tim’s formula. Quite the difference! The same publication reported fertility rates of 2.08 for 1989 and about 3.9 for 1950.

Here are the figures for Ireland only for those years:

figures Ireland 1970 and other yearsNow you can shift your eyes right, and look at the totals of more distant cousins. Again, I thought I’d got the calculations wrong, this time because those numbers were so big. Thankfully, John Reid of Anglo-Celtic Canada Connections has taken Tim’s formula and applied across a wide range of fertility rates, so a spot check stopped me from fretting.

I’ve seen other reported projections of numbers of cousins, including one from Ancestry and one from 23andMe. I’ll address them in other blog posts and compare them with Tim’s.