Here’s Mathematician KP Hart’s Math Question and Answer for Friday, October 23rd!

#### Is the number of combinations in DNA profiles finite?

Is there a possibility – no matter how small – that some time in the long history of humans on Earth someone walked around whose same DNA is the same as mine? Or is it possible that this may happen in the (hopefully) long future that we have as humanity?

The question confuses two notions: DNA and a DNA profile. The first is simply the molecule that can be found in every cell in the individual. The second is a (relatively) short sequence of numbers, derived from that molecule, that is assumed to contain enough information to identify the person to which it belongs (It is definitely not good enough to reconstruct the whole molecule). The answer to the principal question is, in both interpretations: yes.

The double-helix model of a DNA-molecule is well known. Mathematically it suffices to know that it can be coded using a sequence of about 3.2 billion pairs of letters: A-T, T_A, C-G, or G-C. Roughly 99.9% of those pairs are the same for all humans; the differences occurs among the remaining pairs, still about 3.2 million. At every position we can find one of four pairs: A-T, T-A, C-G, or G-C. This gives the following estimate for the number of possible human DNA molecules: 4-to-the-power-3,200,000. As 210 is about 1000 we can simplify this to, roughly, 1000640,000 – a number with 1,920,000 zeros. But still finite. The real number is much smaller because there are lots of regularities in the sequences. Those regularities lead to DNA profiles. At various places in our DNA certain short sequences of pairs (`words’) repeat a few times. A DNA profile is a sequence of numbers that counts for some well-determined sites how often the local word is repeated. The number of repetions varies in the interval [5,50]; individual sites have their own averages and spreads. In the US, one looks at 13 sites, other countries use 11; in actual fact the number is double that because one looks both left and right at sequences of letters. If we round down to 10 sites and look both left and right then we find a lower estimate of the number of profiles of 520, which is more than 95,000,000,000,000. Rounding up to 15 will give an upper bound of 1051 for the number of profiles.

There is a rough estimate of the number of humans born till now: about 108 billion. We saw that there are more than 95,000 billion profiles, so for now we have enough profiles. Newer techniques enable us to improve the profiles so that we will get many more.

A word of warning: Just like when we discussed the ideal dating site we cannot assign profiles to people, so for many of the profiles it is uncertain if they will ever belong to a person. The answer to the secondary question is: we do not know. A profile is a simplification of the DNA itself and in that simplification differences between individuals may disappear. The sites for profiling have been chosen so as to make the probability of two people sharing the same profile as small as possible. Achieving zero probability is very difficult. Strictly speaking: distinct DNA profiles mean distinct individuals, but identical profiles do not mean identical individuals.

***********

Read all of KP Harts math questions here!

About Dutch Mathematician KP Hart: In the beginning of this year the Dutch government opened a website, The Dutch Science Agenda, where everyone could post questions that they thought were of scientific interest. This was an attempt to involve the whole country in determining what the Dutch science agenda should be in the coming years.

I looked through the questions and searched for terms like `mathematics’, `infinity’ … to see what mathematical questions there were and I noticed various questions that already have answers (and have had for a long time). On a whim I decided to post answers to those questions, in Dutch. For your edification I will translate these posts into English.