Surely, there must be something wrong here...?

Where agnostics and atheists can freely discuss

Moderator: Moderators

Post Reply
User avatar
Undertow
Scholar
Posts: 486
Joined: Wed Jun 27, 2007 6:01 am
Location: Australia

Surely, there must be something wrong here...?

Post #1

Post by Undertow »

Hey, I need help with a little probability test, more specifically to see if my maths is at all fallacious (maths always escaped me in school so forgive me if this seems overly simple to you). Please critically analyse the follwing, if you're so inclined to do so (the green part below is the meat you need to rip to shreds if need be):

First a little background - a retrotransposon is basically a little chunk of DNA which can 'copy-paste' itself throughout a genome. The insertion of the 'paste' is generally regarded as homoplasy free for some retrotransposons, which basically means random (although there are some insertional prefeances at times, which is why I added the leniant probability below).

Basically, then, the probability that any retrotransposon inserts in a specific location along a genome is 1 in however many places it can insert - many, many millions in terms of human and chimp genomes.

The reasoning comes about because when looking at the two genomes, we see many shared retrotransposons in very specifc shared locations between the human and chimp genomes. This is the spawn of the below test:
A Simple Mathematical Probability Test –

Let’s try a test. What would it take for all of these shared genetic retrotransposons of a 3 billion base pair genome to come about independently? Let’s give the proposition that they all came about independently a ludicrously lenient probability for two of the same retrotransposons inserting in the same locus – 1 in 10, or 0.1. Realistically, for those elements that are virtually homoplasy free, the probability for even one shared retrotransposon to come about independently of common ancestry is much, much smaller. Now, let’s take the following retroelements as examples –

(100 common retroelements, presented later [trust me, there are more than 100 :lol: ])

Taken the simple rule of the multiplication of the probabilities of two unlikely events occurring coincidentally, let’s multiply the 0.1 probability of each event occurring independently of common ancestry and multiply it by the 100 retroelements chosen for analysis. This multiplication gives 0.1^100 which gives us a probability of 0.1^-100 i.e. 0.0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001. This is virtually a probability of zero.
By the way I'm doing this here because I really don't have the time to wade through a sea of apologetics excuses. Please, I ask you to be very tough on my maths if it is at all bad.

Thanks.
Image

User avatar
Furrowed Brow
Site Supporter
Posts: 3720
Joined: Mon Nov 20, 2006 9:29 am
Location: Here
Been thanked: 1 time
Contact:

Post #2

Post by Furrowed Brow »

Hi Undertow, did you know micatala is a maths prof. But being a Christian he can't post here. Not saying he would though. He might see that has helping you with your homework. :D

Added point: Ah...I just rembered he's a moderator. So maybe he can post here. I dunno now.

User avatar
Furrowed Brow
Site Supporter
Posts: 3720
Joined: Mon Nov 20, 2006 9:29 am
Location: Here
Been thanked: 1 time
Contact:

Post #3

Post by Furrowed Brow »

I was involved in another debate on another forum when a guy who was a professor of physics put the birthday problem forward. I thought it might be applicable.
Phsyics Prof wrote:If I gather people randomly off the street, how, many people do I need to gather to get two people who share a birthday? 365/2.... NO, the point is you have not specified which day, the answer is actually 18. It is when 364/365 time 363/365 times 362/365... becomes smaller than ½
Here's a math link to the birthday Problem.

So take one strand of DNA . This is like a birthday. We need to know how many possible strands of DNA there can be - which is like asking how many possible birthdays are there? For birthdays that is 365. I’m not sure of the answer for strands of DNA. So lets call this number N.

The location is analogous to a person in the birthday problem. The odds of finding two chunks of DNA at the same location, is then 1-( 1*(N-1)/N*(N-2)/N* (N-3)/……, ) with the series completing at the nth term where n is the number of the location. Which is to say if there were only 4 locations then the series would stop where my example series stops.

So to get the right probability we need to know the number of possible different DNA chunks, and the number of possible locations. then we take the product of the above series and times that by itself the number of times there is a duplication. If the product of the series is S, then the probability of the observed duplications in the human and chimp genome becomes S^n. If as you say there are hundreds then here n will be in the hundreds. If S = 0.9 and there are 100 duplications then the probability would be 1/37649 approx. If S = 0.99 and there were 200 duplications the odds are between 1/7 and 1/8. if S=0.9988454 and there are 600 duplications then the odds are approximately half, but if S were 0.8 and there were 600 duplications then the odds would be astronomically low. You have guestimated a figure of S = 0.1.


Good luck!! I think you'll need a computer.

However, without knowing the specific numbers we are to start with we can still make some broad brush observations regarding the value of S. If the number of locations is equal or greater than the number of permutations of DNA chunks then it becomes more probabable that one location is duplicated in the human and chimp genome. If the permutations of DNA chunks exceeds the number of locations then the probability heads in the other direction and approaches zero.


Warning: I failed my Math and Stats A level many years ago. So it is probably best to seek out sager advice than mine.

User avatar
Furrowed Brow
Site Supporter
Posts: 3720
Joined: Mon Nov 20, 2006 9:29 am
Location: Here
Been thanked: 1 time
Contact:

Post #4

Post by Furrowed Brow »

Some further thoughts. The human genome is about 3*10^11 base pairs. And the length of a retrotransposon is approximately 10^2 base pairs. Given there are only 4 combinations AT, TA, GC, CG, that makes 4^100 possible combinations or around 1.6*10^60.

If we are dealing with 100 base pair chunks that makes 3,000,000,000 locations or 3*10^9. The number of possible 100 base pair chunks (the number of birthdays) then vastly out number the available locations by a factor of 10^50. This makes Undertow’s guesstimate of S =0.1 appear massively optimistic. the final probability being even smaller than Undertow’s projection.

At this point I am probably attempting stuff beyond my abilities, so I’m going to shut up. Where’s micatala? I’m sure he can put this right

User avatar
Undertow
Scholar
Posts: 486
Joined: Wed Jun 27, 2007 6:01 am
Location: Australia

Post #5

Post by Undertow »

I think I'm in over my head to be honest. :?

I'll progressively try to come to terms with some of these numbers and concepts.

By the way my suggested probability of 0.1 is VERY lenient in favour of the position that they very well could have come about independantly in both human and chimp. In reality, many of these elements are regarded as homoplasy free, meaning it's very, very unlikely they came about independnatly in humans and chimps (I just don't know the numbers, which is what I'm trying to figure out), which is evidenced by the fact that there seem to be no shared retroelements that indicate realtionships outside those which we already know, i.e. it seems very little, if any, retroelements are shared where they shouldn't be (especially those 'Alu' SINEs) according to evolutionary theory.

EDIT: Just to close for now, let me lay down some key known stats:

*** I'll deal with Alu retrotransposons only, which are no more than 300 base pairs long.

*** Both the human and chimp genomes are roughly 3,000,000,000 (3 billion) base pairs long.

*** I'll assume a VERY lenient homoplasy probability (i.e. the probability that they did insert independantly of common ancestry) of 1 in 10 or 0.1. (if you think about this point, it would make every 1 in 10 insertion event, by sheer coincidence, lie on a locus of an Alu of the complementary species.)

*** I'll use exactly 100 common Alu insertions, for a nice round number, so I can more easily get my head around the sums.

So yeah, I might see if I can take this up with Micatala if at all possible later.
Image

Post Reply