Believe it or not, as of writing this, the largest family tree in the world has over 80 million people, and it’s getting bigger every minute. And pretty soon, it might include everyone. You can see how big the tree is now and watch it grow as we speak in the counter below.
Beneath the counter is a part of the tree as a mapstory (Note: You must see all these videos full screen, otherwise you can’t see the necessary details). The moving dots start where a parent was born and ends where their child was born. This is MapStory Geneology, the goal of which is to help map the relationship of everyone who has ever lived for which there is a record of. Soon enough, you’ll be able to see how you’re connected to everyone who has ever lived, going back to our earliest ancestors in East Africa.

So how can someone make a family tree that big? Of course the answer is crowdsourcing on the internet. You may have heard of websites like ancestry.com – there are actually many sites now, and you can transfer your tree and records from one site to another. One site, Geni.com, takes those family trees and automatically combines them into larger family trees. As of now, the 7 million people on Geni have added over 140 million profiles of their relatives, generating a single family tree for most of them. A year ago 43 million of those profiles were taken by researchers at the Erlich Lab at MIT and the names were anonymized for privacy, and put into a nicely organized database, for people like me to use it, and many of the records have peoples’ gender, who their parents were, and their dates and places of birth and death.


table_of_tablesThe full specs of the largest family tree of 43 million people.

I got into all this when I heard a podcast several months ago of an interview with A.J. Jacobs, who said that he was contacted by his 12th cousin in Israel, who told him that he compiled a tree of 80,000 of his other cousins. At first he was apprehensive about having such a large family, but soon got fascinated by it, and wrote a popular article in the New York Times. Geneology is connecting us all in deep ways, showing us our relationship with anyone and everyone in ways we would not have otherwise known. Jacobs is now hosting the largest family reunion in the world, with over 5000 people attending, in Madison Square Garden in New York City.

In excitement, I mapped what I could find of A.J. Jacob’s tree, which wasn’t much. I left it for a while, and I recently finally did some Google searching and found Erlich Lab’s database. I considered how to map it, and coincidentally, at the same time, I found the following mapstory made by Maximilian Schich and colleagues that shows the migration of 100,000 of the most influential people in the Western world from databases such as freebase. If I were to rate mapstories, this would certainly be in the top 5 most amazing I have seen. It shows incredibly interesting trends that explain the geographical expansion and character of different places. It gave me confidence that this will work – I played around, and created a dot for every month between the time and place of a parent and their child’s birth; this created a nice organic-looking animation. I also made my code available in case anyone wants to generate the animations themselves. I did my animation in ArcMap, and they did their much cooler animation in Gephi, which I have only started to use.

Below is a complete mapstory of where each location was first listed in the tree. Erlich Labs made a similar video earlier. I also added this one to MapStory – the other ones were far too large to put up. As you can see it’s pretty Western-centric, in fact pretty well centered around what is now the UK the further back it goes in time. It does a pretty good job of showing the expansion of the Western world outside of Europe. But it also tells the story of information and the internet itself – these are the places where the most people are online, and are also connected through lineage. Though I grew up with the internet, it excludes people like me, since I have no known ancestors connected to this tree during this time period. So this shows a genealogical network effect, where the largest and deepest family tree emerges out of how many people can connect to one another. In a way this is the story of the internet itself, the largest family, connected by blood, that came to be online.

It also shows the depth of genealogical records – though there are tens of millions of people in the Americas who have some ancestry in both West Africa and Western world, very few can be seen emigrating from Africa, because people were disconnected from their lineage during the forced migration during the slave trade, and perhaps also a lack of records in West Africa and the places they were taken. And though many of the Latino migrants arriving in the US also have some European ancestry, the environments they were from are less connected online, and are also perhaps less connected to the Anglo-centered tree.

It is not that interesting to see a mapstory where nothing is moving, so I only wanted to animate intergenerational migration – people who were born in a different place as their parents, and whose locations were precise. The 43 million was whittled down to about 195,000. I tried hard to map everyone, but my program ran into bugs. The first video at the top is 10,000 people randomly selected. Here is a small portion I was somehow able to map that shows 1886-1901 – it plays for only 2 seconds, and I looped it for 30 seconds.

Of course, you can focus on trends within the overall currents. I also extracted all the people who emigrated from what is now Italy (1600-2010). You can see the patterns of emigration, especially the large scale migrations to the Americas in the late 1800s and early 1900s. I am also going to generate one for immigration into one country – I’m thinking Israel would be particularly fascinating, seeing where people emigrated from after the end of World War II.

Within the 43 million people, the largest single pedigree has 13 million people. But within the subset of 195,000 inter-generational migrants, the largest single pedigree had 456 people. What you’ll see is several people emerging from a single parent – a mother in Massachussetts shown by the yellow dot. The woman had children with more than one father, and one of the fathers was born in England. It shows the area at different scales, and you can see how they ricochet in and out of towns, across the landscape, from Massachussetts to Alberta, across 8 generations. You can also see it in one shot in a separate video.
As this is the largest group of relatives with precise locations, I’m guessing that this is from someone or a group of people who either were, or had ancestors who were very good record-keepers. Below the video are two images of pedigrees – one nice one generated by the Erlich Lab and another more crude tree generated by me of the tree you see in the video. I have yet to make one that shows the child with the most progenitors (parents and parents of parents and so on) – it would be cool to see a bunch of dots converging into one.

Largest_Pedigree_456_Intergenerational_Migrants
The Largest Pedigree of
Intergenerational Migrants.
456 people over 8 generations.
10000nodes
A 6,000-person pedigree, over 7 generations, from a larger pedigree of 43 million people.

 

In the coming years, this will expand far more than what you see here. One nation – Iceland – has its entire geneology of everyone who has a long term family history on the island, going back to the first Viking inhabitants over a thousand years ago, accessible in a database. In the coming decades this will be true of whole continents or civilizations – as you can see, European descendants are well on their way. The Mormons have 2.4 million rolls of microfilm with 2 billion names behind 14-ton doors in the heart of a mountain that can withstand nuclear attacks. People will be connecting the dots, and making things more and more detailed, accurate, and precise. I am currently doing work to see if I can map the location of everyone all the city directories in my town, which I would like to finish next. If everyone did this, most of the people in the history of the places that were connected by telephones will have peoples precise locations mapped every year (excluding current locations for privacy).

And that is only the beginning – this is still only the Western World, which accounts for roughly 10% of the world’s population, in both the present and the past. As more of the world gets online, and language barriers are also crossed, people will be connecting themselves to their more tightly wound histories, creating network effects in different genealogical spheres.
And each sphere has its own rich geneological traditions that have yet to be tapped. My own tree on my father’s side goes back 21 generations, compiled by one of his cousins from the time the first Gadia created his surname when arriving in the town of Jhunjhunu in Rajasthan in India. When my Dad went to Haridwar, at the point where the sacred Ganga meets the plains to spread his mother’s ashes, he found the Bansal family priest (Gadias are a part of the Bansal gotra), and asked him who visited him last – turned out it was his cousin who did the geneological research; the priest had records of the Gadia family going back to its beginning, from which he may have compiled our tree. Tens of thousands of people spread their ashes in Haridwar alone every year – imagine the records that have yet to be tapped in that city alone.

KIC000007
My family tree, going back 21 generations of males. My Dad is marked by the bottom-most red box. It focuses on only two lineages going 9 generations back, and then reduces to only direct ancestors.

 

Now, what about all the people in the world who don’t have records? For most of the 108 billion people who have ever lived, their memories of who they are related to have died with them, which continues to largely be the case. Of course, as more information becomes available, and with every nation on Earth conducting a census, that is going to be less the case going forward. From roughly this time on humanity’s time scale, our geneological records will cover perhaps 98% of humanity – but the further back you go, the fuzzier it gets. While we can’t know every person that has ever lived, we do in fact all have genealogical information stored in our bodies in the form of DNA. We can see how we are all related to each other, and more and more people are putting their DNA online, which is matching us with one another. Geneographical studies have traced our ancestry all the way back to East Africa, and you will be matched with this deep tree. Of course this also would get fuzzier the further back it goes in humanity’s 150,000 long history, but you can easily see who you’re related to in the more recent past – and the past does not go back that far in the grand scheme of things – the most recent common ancestor of everyone alive today was probably alive only 3000 years ago.
Today anyone can swab some of their spit into a container, and send it to people who will show how you are related to the rest of humanity for less than $100. The cost of an entire genome is crossing $100 now, which has been falling 10 times every 1-2 years. Pretty soon sequencing a whole genome will be a routine thing; perhaps people will even be able to do it themselves it with a kit and a small device connected to a USB stick. Imagine spitting into something and seeing all your own genetic problems and a mapstory generated of how you are connected to the rest of humanity, from a private or even personal computer.

sequencing-chart-2
The exponential reduction both of the cost of sequencing a human genome and the number of genomes sequenced so far.

Of course as geneological information is shared by definition, there are great privacy considerations to take into account. You can see how you’re relate to others with your DNA – but you can do the same thing with other peoples DNA, or by doing a little research through geneology sites. This can pose a great security risk for people who want their families protected. Add a long history of racism and genocide in humanity’s history and you get even more potential problems. And then there’s knowing who you’re related to that you might not want people to know you’re related to. People might not want people to know their relatives; likewise, people might also not want to be identified themselves. Erlich Lab has done quite a bit of research on this issue – they demonstrated that someone can even trace an anonymous sperm donor by simply taking the DNA of a child and matching it to people online, and even if the father’s DNA is not already there, they can hone in on the person with a bit of research. Many people make their trees private or shared among a small group, but even this might not help, since others can still add you to their tree to you without you knowing. There are remedies for that, like asking to have your name removed, but it’s still becoming easier and easier to violate peoples privacy, if we’re not careful.

nrg3723-f2
Finding the identity of an anonymous genome with genealogical data online.

So what’s next for MapStory Geneology? Frankly, sites like Geni do the easy job of crowdsourcing, and we do the even easier job of mapping it. But there is much to be developed. Once I understand any ethical implications and if it is still permissible, I’d like to see if it’s possible to see how someone (perhaps A.J. Jacobs) is connected to everyone who has ever lived, automatically generating a mapstory for them to see. My guess is that someone can plug in their tree, private or public, which if large and varied enough, will match them with where they are in the anonymized tree, and then see how they’re related to everyone. And going back from there, perhaps I can see how we can connect the genetic family history as well, and people can see all the way back to the beginning. Perhaps in the long run we can even go further, with showing how we are related to all our chimp and ape cousins, and the full tree of life geographically with genetics and the fossil record. Even the first step in all this would require technical knowledge I’m not sure I have – but if  I were to get help, perhaps it can be done before the family reunion and show it off there. In any case, whenever it is developed, that should be a spiritual experience for anyone to see for the first time.

> Add and view comments here.
> Subscribe to this blog here (usually once/month or less).