Ancestry display issues – is your ad-blocker leading to a problem?

Here’s a tip on avoiding a particular display issue with Ancestry when using Chrome. The problem of the spinning wheel started occurring for me in October. The little wheel kept spinning, and the page of match lists would not scroll and display properly.

I followed the usual Ancestry instructions to clear down cache to no avail. I also noted that the problem was not occurring in Firefox, Internet Explorer or Opera.

I recently discovered what was the cause: the ad-blocker that I’ve had installed within Chrome for several years. Something changed in Ancestry’s October feature release that doesn’t “play well” with uBlock Origin, my ad-blocker of choice.

I solved this particular problem by disabling the ad-blocker for Ancestry only. That’s a simple matter of a single click. Once done, and I hit page refresh, I no longer get the spinning wheel of eternal doom.

Another new Ancestry feature – CMs on the Match List Page

As of late October 2018, Ancestry have released another new display feature on the Match List page: a row below each match displays the number of shared centimorgans and segments.


This should be a real time saver when reviewing matches by scanning the page. In combination with the other new feature I posted about, the “no tree” display, these are just great user-friendly features that minimizes clicking and waiting for for each individual match page to load.
I would expect it will also reduce load on the Ancestry web servers, as users are less likely to click into their latest 6.0 CM matches. That would be good news for the customers, as it should mean we see less of the “name unavailable” type of errors.

Ancestry’s new feature on the Match List Page – More Tree Info

Ancestry added a very useful new feature to the main list of matches at some time in September/October 2018. I’m being vague on the timeline because not all Ancestry users get the same new features at the same time.
We now see a new tree description toward the right hand side of the page: “Unlinked Tree”.


Prior to this change we’ve only seen categories “No Tree”, “Private Tree” and a number in the case of a linked tree. The tiresome consequence was that in order to see whether a match had an unlinked tree, we had to click on the “No Tree” to check if they did in fact have no tree or if they actually had three enormous unlinked trees full of buried treasure.

This really is a welcome feature when scanning pages of matches to quickly determine which match to investigate.

Ancestry doesn’t display the “number of people” beside the unlinked tree. That is understandable, because if the match has seveal unlinked trees – which one should be picked? It would be possible to calculate the highest number within several trees but that’s an extra layer of processing that could be a drag on performance (how quickly the page displays).

One cautionary note that applies to new software features: don’t bet the house on it just yet. One user reports in early October 2018 on the Ancestry boards that they clicked on a match displaying “No Trees” and the match in fact had five unlinked trees. Screenshots were added to prove the case. A big thank you to that user, because its important that Ancestry gets feedback of false negatives.

Gender Breakdown of my Ancestry Matches

When I first took a snapshot of my Ancestry match data, I didn’t bother taking every piece of information available. For example, I didn’t think that the kit administrator info was useful (the “managed by XXX”), but I’ve changed my mind. So before I did a re-run to capture the extra detail I had a look to see what else I’d skipped over.

One piece of information available is the gender of our matches.  Visually, the match list pages usually make it very clear by the Pink and Blue headshot graphics.

 

 

But it’s not so easy to tell, when the match has uploaded a photo of what looks like a tasty cocktail.

Thankfully, the gender is readily available “behind” the graphics. Every match has a gender tag of “male” or “female” within the web HTML.

It occurred to me that if this setting is user-selected, then it might not always be accurate. I was trying to remember if I ticked a gender box on sign-up, but then I realized that DNA testing companies have the definitive answer – that whole Y chromosome stuff.

So I revisited my August 2018 snapshot of matches and assigned the gender information to each of my matches. I did the same for the kit of a friend of mine. She has twice my matches – as shown in these numbers:

But the gender breakdown is remarkably similar, in that we both have the same proportion of male and female matches.

Unless we’re both outliers, more women than men are testing with Ancestry.

Once I had these numbers, I wondered if there was a difference in gender in the proportion of testers interested in genealogy versus interested solely in ethnic heritage. I have no way of knowing why people test, but I do have metrics on which of my matches have trees. There turned out to be very little gender difference, 28% of men have no tree versus 26% of women.

How many of my Ancestry Matches have added or removed Trees in the Last 3 Months?

I conducted a review in May 2018 of all my DNA matches on Ancestry and recorded whether the match had a public or private tree, and whether a tree was linked to their DNA. I ran the same review three months later in August 2018, having seen an additional 15 hundred matches added during that time.

The percentage breakdown of public linked/public unlinked/private/none has not changed in rounded numbers. It remains as 40% of my matches have a public linked tree and 27% have at least one public unlinked tree. 7% have only a private tree. 26% had no tree at all.

I’m relieved that the 26% No Tree has not increased. See my May blog post for a comparison with other Ancestry users who have blogged on their numbers. One comment from Blaine Bettinger mentioned he was interested in tracking how many new matches add a tree after a period of time such as a year.

Well, only three months have passed for me but as I have the data to hand, I’ll take a look at the breakdown: of both new matches and also of older matches who have since added a tree within the last three months.

Some good news for me is:
102 matches had no tree in May 2018 and have since added a public or unlinked tree.
28 matches had a private tree and have since added a public or unlinked tree.
7 matches had no tree at all and have since added a private tree.

Ignoring the private trees, in total about 1.2% of my matches added an available tree after some delay in time.

But I must hold off on breaking out the bubbly! Some of my matches went in the reverse direction.

42 matches had a public linked or unlinked tree and have since gone private, leaving no public tree available.
 4 matches had a public linked or unlinked tree and have gone nuclear i.e. they’ve removed any tree from Ancestry.
 0 matches were private and had gone nuclear – just adding that one in for completeness.

It’s still a positive balance.

But it leaves me with a net of 0.8% of older matches who went from no public tree to delivering a new available tree within the last three months.

 

I stated that the overall proportions of tree availability hadn’t changed since my prior evaluation of May 2018. That isn’t the case when I evaluate only the matches who were added within the last three months. Below is a side-by-side.

So that 3 month breakdown is of about 1500 matches, or about 14% of my current overall total. I don’t like to see that the “No Tree” category is proportionally larger at 37% versus the overall 26% figure. It made me triple-check the figures, but that’s the current picture.

August 2018 Analysis of Growth of Matches on AncestryDNA

This post follows on from my May 2018 analysis of the rate of growth of my DNA matches on Ancestry between July 2017 and May 2018. I took another snapshot of all my DNA matches in August 2018. I’m interested in capturing:

  1. THE RATE OF GROWTH OF MATCHES
  2. THE DISTRIBUTION OF MATCHES BY SHARED CM
  3. TRENDS IN THE NUMBER OF OPT-OUTS/REMOVALS

See the earlier post for background on the technical aspects. Here I’ll just dive into the numbers.

(1) THE RATE OF GROWTH OF MATCHES

These are the total numbers at each snapshot in time: 5,100 in July 2017, 7,494 in Feb 2018, 9,026 in May 2018, and the latest number is 10,566 total matches.

The time intervals aren’t exactly similar between Feb->May and May->August but close enough to say that the rate of growth is fairly even: 17% and 15% respectively.

(2) THE DISTRIBUTION OF MATCHES BY SHARED CM

I am also interested in the number of matches broken down by CM, which Ancestry does not provide via the website. I’m going to roll up the matches into four ranges (see the same section in my July post for why I’m doing this):
6 CM to 6.9 CM
7 CM to 9.9 CM
10 CM to 19.9 CM
20 CM and over

Here are the raw numbers:

Below is a chart of the distribution. There is no change across periods, which is no great surprise. I discussed the distribution in the prior post.

The percentages for “20 CM and above” are a little hard to read in the above graphic, because they are proportionally small. They read: 1.1%, 1.2%, 1.0%, 1.0%.

(3) TRENDS IN THE NUMBER OF OPT-OUTS/REMOVALS

In late April of this year, a wave of publicity arose from the use by the FBI of Gedmatch (a DNA matching site) to help catch the “Golden State Killer”. There’s a great round-up of links on the Cruwys blog.

There has been speculation in the DNA genealogical world that the publicity of outside agencies using our DNA might lead to people using the opt-out function in Ancestry, or remove their DNA entirely. I have no accurate way of measuring opt-out rates using snapshots in time, as I can’t capture new testers who immediately opt out. But I’ll take a look at the figures I have to hand.

My May snapshot was taken mid-month, and in a prior post on Ancestry opt-outs I noted that two matches disappeared between my February and May snapshots. Before that, five matches had disappeared some time between July 2017 and February 2018.

Between Mid-May and Mid-August, a total of 9 matches have disappered from my list. Just to be clear, these are matches that were present in mid May and are no longer available to me. I have no way of measuring matches who signed up in June and opted out in July or on the day their results came through.

Of these nine people, three had linked their DNA to family trees so its a pity to lose them. You’d think this trio weren’t people who were only interested in ethnic heritage and hadn’t realized that matching existed.

So 9 losses in the past three months is of course more than 2 in the prior three months. Not enough to worry me, but I’ll be keeping an eye on the numbers.

Measuring Genealogical Similarity using the Jaccard Index

For some of the posts on this blog I’ll be using one way to measure the similarity of two sample sets of data. The statistic is called the Jaccard Index, or the Jaccard Similarity Coefficient. This post is a technical explanation of the calculation itself.

The sets of data are the unique ancestral surnames of my DNA matches. The question I’m asking for any two of my matches is: how similar are their lists of direct ancestral surnames?

If two lists of unique surnames are identical, they will have the exact same surnames. They will also have the same number of surnames in their lists, as each surname is only represented once regardless of how many times it appears in the direct tree. They will be 100% similar.

However, I’m also interested in trees that are “nearly” the same. Suppose two siblings create separate trees, and both get as far as all their great great grandparents. Tom’s research leads him to one pair of 3rd greats, and Joe finds a different pair. Neither are aware yet of the other’s research, but both have one extra maiden name each in their trees. Those lists will be very similar, and I’d like to highlight their similarity in some way.

So I need a way of defining the “similarity” of two lists of surnames. The Jaccard Similarity Index compares two sets (or lists) to see which members (surnames) are shared and which are different. It calculates the percentage of similarity from 0 to 100%. The math is pretty simple, and is described here in understandable terms.

In the simplest terms, we count the intersection of the lists i.e. the number of surnames common to both trees. We count the differences for each side, and we count the total number of surnames in all. The Jaccard index expresses this mathematically as:

J(X,Y) = |X∩Y| / |X∪Y| or (|X∩Y| / |X| + |Y| – |X∩Y|

Taking our two brothers, Tom and Joe:
|X∩Y is the number of shared surnames: 8 for the brothers.
|X| is the length of the set, or the number of surnames for Tom’s tree: 9.
|Y| is the length of the set, or the number of surnames for Joe’s tree: also 9.

So our equation is: 8 / (9 + 9 – 8) * 100 = 80% similarity for our brothers.

If brother’s had exactly the same trees, they’d be 100% similar. If the postman’s tree had no overlapping surnames with the brothers, his index compared to both would be 0%.

So the ultimate task is to compare every surname list within my matches with every every other surname list. As the Jaccard index only works on two sets at a time, to calculate the similarity across N sets requires N squared calculations.
This becomes unfeasible for large numbers of sets, and there are other methods that can be brought into play to reduce processing time. I had about 4.4 million pairs of sets to compare, which took a matter of hours to complete.

Note that for my current purposes, I am using unique surnames. If one match has entered father, grandfather and great-grandfather John Smith, his list has Smith represented once. This is to simplify data collection and computation.

Note also that For my current purposes, the direction of surnames is unimportant. Match #1 may have a two-person tree with Mary Smith as the mother of Bob Jones, while Match #2 has Anne Jones as the mother of Bob Smith. That is “Smith->Jones” and “Jones->Smith”. If I include direction, these lists are different. I am treating the lists as a “bag of words”, where direction is not important – so these two lists “Jones, Smith”, and “Smith, Jones” are the same. This is to simplify data collection and computation.

Two caveats must be considered with the Jaccard Index. One is that it can be erroneous for small sample sizes, so I intend to exclude small trees.
The other problem for the index is when there are missing observations in the data sets. It’s safe to say that most of my lists have missing observations, as I’m not drawing from a sample of relatives with perfect trees to four generations. The trees tend to be ragged i.e. people know more about one branch than another.

How Many of My Ancestry Matches have Identical Trees copied between Accounts?

Something I find mildly disappointing with a new DNA match on Ancestry is when I realize that I’ve seen their tree before. The exact same identical tree. I’ll recognize this is the case if I’ve spent some considerable time studying the tree in a previous encounter from an earlier DNA match.

What’s happening is that someone is managing multiple accounts, and Ancestry provides the facility to copy the same tree across accounts. Suppose you match to two kits that have been assigned the same tree. When you click on “View Full Tree” from the pedigree page of both matches, the same tree URL is opened.

The reason I get disappointed is that I’m not going to glean much new insight from the DNA match other than an assumption that they are closely related to whoever they have a tree in common. But how often am I likely to stumble across this phenomenon?

In May 2018 I took a snapshot of information across all my DNA matches, and this included the Tree URLS. I used techniques described here, but I think that the DnaGedcom utility also retrieves the Tree URL to spreadsheets. The data allows me to do a little data mining on my snapshot of Tree URLs.

Question: How Many of my Ancestry Matches have Identical Trees copied between Accounts?

Approximately six thousand of my nine thousand DNA matches have an available tree with at least one visible entry. (The 6K excludes those pesky trees with everybody marked as private). 428 of those 6K have an identical tree with another account. Those 428 matches “should” have 428 trees, but they account for a total of 191 trees instead.
So about 7% of my matches with available trees.

It’s not a lot, but it does have some impact. I wrote before on the number of my matches with available trees. The charts I presented in that post are still correct, but the “usefulness” of a subset of those matches is reduced.

One caveat to these numbers: when a match has not linked their tree but has multiple unlinked trees, my snapshot examined the first tree in their list and ignored any others.

Question: Just How Big does this Tree-Sharing Get?

Within the subset of kits sharing a tree, the vast majority are in pairs i.e two matches sharing a tree.
The highest number of matches sharing the same tree is 8. Octuplets? Well, they all have the same surname.

Here’s the distribution:

The Top 10 Ancestral Surnames across my Ancestry DNA Matches Surprised Me

In May 2018 I downloaded the direct line ancestral surnames of all my DNA matches at Ancestry. I discussed some statistical analysis on the numbers in a previous post.

I then conducted some exploratory data mining of the distribution of surnames across my matches. My paternal line is African (not African-American), and I have very few paternal matches on Ancestry. My maternal line is Irish, and I expected the usual suspects of popular Irish surnames to appear in my own top 10 list. I’m talking Murphy, Ryan, Kelly and the like. I could imagine a pattern where my emigrant ancestors landed in the traditional “Irish” enclaves of New York or Boston, and married exclusively within the cohort of people they met from the old country. If their American descendants followed suit, then the distribution of surnames across my matches should skew Irish.

Data mining is about asking questions of the data, so here is the Q and A.

Question: What are the Top 10 Surnames across my Matches?

Before I crunched the numbers, my educated guess for Number 1 was Smith, which can be of Anglo-Saxon origin or of Irish origin. My maternal great-grandfather was a Smith. At least one of his sisters married a Smith. And it seems that I see Smiths everywhere I look among my match trees. Well, I wasn’t wrong on Number One, but the next 9 surprised me. Here is the distribution of the top 10 surnames across my Ancestry matches.

I do not recognize any of the next nine surnames from my known direct lineage.

Question: Is my top 10 Surname Distribution typical of Ireland (north and south)?

So I compared my distribution to a paper published by Sean J Murphy titled “A Survey of Irish Surnames 1992-97” . Murphy presents the top surnames on the entire island of Ireland (i.e. including Northern Ireland) based on data he gathered from 1992-1997. It’s a great read for anyone interested in Irish lineage. Because my Irish ancestors are predominantly Ulster, I’m working with Murphy’s numbers instead of recent data from the Irish Central Statistics Office, as theirs does not include Northern Ireland. Murphy provides raw numbers in tabular format, but I’ve plotted the distribution to allow a broad side-by-side comparison to my own:

Clearly the answer is no, the distribution across my matches doesn’t look similar to the distribution of surnames on the island of Ireland.

Question: Is my top 10 Surname Distribution typical of British surnames in Ireland?

I wouldn’t have thought of this without reading Murphy’s paper which has a section on “British Surnames in Ireland”. I have not established a genealogical link of my ancestors to British origin. But my distribution sure does look more similar to this breakdown than to the Irish one.

Before I draw any inferences, I’ll ask another question.

Question: Is my top 10 Surname Distribution typical of the United States?

I then compared my distribution to the USA census figures. I chose the 2000 census to be similar time frame for the Irish numbers. It might be better to use use earlier census figures: because living people are private, the available names in trees are likely to represent an earlier time frame. But its all very approximate, and I’m only interested in the broad distribution, which is certainly closer to mine than the Irish distribution.

The names “Garcia” and “Rodriguez” jump out at me because I’m not familiar with them. Checking my data, I have a sum total of 17 matches with a direct ancestry to Garcia or Rodriguez. My understanding of American demographics is that the last few decades will have pushed Hispanic surnames upward in frequency. So I narrowed the census numbers to filter on respondents who identified as white European lineage.

So after all that, I can see that the distribution of surnames across my Ancestry matches is closest to the USA white population.

Of course the vast majority of Ancestry customers are American. However if there was a very high tendency among my own emigrant ancestral relatives to keep themselves to themselves and only to marry within the typical Irish communities, then my top 10 distribution would surely be closer to the filter of the top 10 Irish surnames within the USA census. (I’m using the Ancestry blog as my source for these next numbers – they are using the 2000 census but I’m not sure how they got this particular filter).

So I don’t have any of the Irish American top 10 names amongst my top list. To be fair, I only have to go to #13 in my own distribution to hit “Murphy”, and I have “Kelly” at #18. But those two names are the only “Irish-American” names in my top 30. (Smith is the awkward outlier. It’s my personal #1 and I’m sure its being excluded, but it can also be Irish origin).

To Summarize:

Before I did this analysis I assumed that I’d have a higher distribution of “Irish” names across my matches. My Irish emigrant ancestors appear to have avoided a tendency only to marry other Irish descendants, but tended to marry within the local population of European heritage.

Measuring the Usefulness of Trees of my Ancestry DNA Matches

Previously I discussed an analysis of the availability of trees across all my DNA matches. The headline figures from May 2018 were

  • 40% of my matches had a public linked tree
  • 27% had no linked tree but had at least one public unlinked tree
  • 7% had a private tree
  • 26% had no tree

A total of (linked + unlinked) 67% of matches with at least one public tree at first glance would seem to be a very positive outcome.

To analyse usefulness I proceeded to drill down further into the actual content of those trees. Specifically, my May analysis included a download of the Direct Ancestor Surname lists provided by Ancestry on the “Pedigree and Surname” page for each match where a tree is either displayed by default (the linked tree) or has been selected from a list of unlinked trees. I use my own utility but the DNAGedcom Client will also download surname lists to spreadsheet to allow analysis. I don’t know if it grabs unlinked trees, mine does to a limited degree (just one per match).

It was immediately clear that many matches had a tree, but had no available surname list. The reason was simple: the tree was public but all entries had been set to a status of living, so their details were hidden:

Everybody Lives!

This was a significant 11% of my matches with a public linked or unlinked tree, or a rounded 8% of my total matches, as reflected in the revised usefulness proprotion:

Single Surname Trees

Aside from the I-see-no-dead-people trees, the other happiness-killers are the loner trees with but a single visible entry:

I also include in this category those trees with multiple generations of a single surname and no visible spouses. These single surname trees were a very similar (slightly lower) percentage to the “Everybody Lives” category. The usefulness proportion is again reduced:

Variety is the Spice of Life

My grandmother was a Smith, and so it seems were all the neighbours. But research-wise I could have had it worse, like this guy:

Aside from Smith, the two other names I’ve blanked for privacy are common Irish surnames. If you’re thinking Murphy and Reilly, you’re half right. With my current research I find that the most useful trees are those with a good variety of ancestral surnames, certainly more than the three in this example. A high variety will indicate a higher number of ancestral generations represented in the tree.

I titled this post as “The Usefulness of Trees”, but usefulness is in the eye of the beholder. If I was searching for living relatives (e.g. as an adoptee) and this was a close match, this might be gold if the match is prepared to reply to messages.

For now, I will define the potential usefulness of a tree by the number of distinct surnames in the ancestral list. That won’t be the case for everyone, and it may change for me if the focus of my research changes. That being said, I now want to get a measure of how many matches with potentially useful trees that I have.

The highest number of distinct surnames amongst my matches has 272 distinct direct surnames. The tree goes back to the early 1600s and has over 12K people. The rest of my matches are distributed over a range of numbers.

For illustrative purposes, the next figure shows where my matches fall into bands of “number of distinct ancestral surnames”. I’ve already noted the lowest two bands: zero and the single surname crowd.

Above zero or one surnames is the band of low variety: I’ve fairly arbitrarily put this as 2, 3 or 4 distinct surnames. This is where I also think little immediate value is to be had. I’d probably need to build out a research tree, which is why I say there is little *immediate* value to me. It’s about 30% of my public linked/unlinked trees. That eats into my usefulness as so:

That leaves me with about 32% of matches with “useful” trees for my current purposes. For me personally, that’s currently at least 2,870 trees.