Tomkins’ 70%

25/06/2014 by eyeonicr

Because mammalian eggs are produced early in life, while sperm are created continuously, fathers are responsible for a greater share of new mutations passed down to their offspring than mothers. This slightly complicates genetics-based time-since-last-common-ancestor estimations, leading to recent results to the effect that the human-chimp split happened about twice as far back as previously thought. Adam Benton has more information, if you’re interested.

This new paper has prompted Jeffrey Tomkins, the ICR’s go-to geneticist, to publish “Chimp DNA Mutation Study–Selective Yet Surprising.” Tomkins is known for contesting the typically-cited genetic similarity figures of 94-99% and having calculated using his own method a “conservative” (i.e. maximum) figure of nearer 70%.

There are various ways in which mutations modified the genome of the last common ancestor of humans and chimps in each group since that time. On the one hand (the left one in this case) they modified, added, or deleted individual nucleotides, which leaves a signature in these comparisons in the form of single nucleotide differences. A percentage difference can be calculated and knowing the rate of these changes (and accounting for things like the difference between the contribution of males and females) we can determine the amount of time it took.

On the other hand mutations can also modify large sections of DNA, physically moving, reversing, adding, and deleting them. When DNA has been removed or added in one lineage but not the other, like in the above, we would get a section that does not “align,” or align poorly. These sections are important when considering the overall picture of differences between two species’ genomes, but are rightly discarded when percentages and splitting times are calculated.

To Jeffrey Tomkins and his supporters this is apparently a conspiracy – the selective omission of contrary evidence. His own research derives the 70% figure in part by leaving some of these sections in. The trouble is that sections that don’t align aren’t comparable: the deletion of a continuous segment of 1000 bases are not equivalent to 1000 individual nucleotide deletions, because the former can be removed a single stroke – they amount to a single mutation, and happen by a different mechanism at a different rate to the latter variety. Tomkins is entitled to think that his figure is a more honest reflection of the true differences between humans and chimps at the genetic level if he wants to – but more on that some other time, maybe – but it is not useful when we want to calculate how long ago humans and chimps split from each other.

Indeed, the 2013 Answers Research Journal paper of his that Tomkins cites here actually looks chromosome-by-chromosome at the percentage of DNA sequences that align well (and not including those that don’t align at all). It does not even look at, on the other hand, individual nucleotide differences. Because of that this data, while not uninteresting, is nevertheless entirely irrelevant to the question of how long ago the most recent common ancestor lived.

Which brings us to what he says today:

The researchers then compared selected DNA segments between chimpanzee and human that were highly similar, omitting the many non-similar regions. They state, “In the intersection of the autosomal genome accessible in this study and regions where human and chimpanzee genomes can be aligned with high confidence, the rate is slightly lower (0.45 × 10−9 bp−1 year−1) and the level of divergence is 1.2%…implying an average time to the most common ancestor of 13 million years [page 1274, emphasis added].” There are basically two notable points from this summary statement that I will address.

The first important point is that the comparative data was clearly cherry-picked—the scientists only used the regions that were about 98% similar and essentially threw out everything else. These are the regions that the researchers stated “can be aligned with high confidence.” It appears that all the dissimilar DNA regions got tossed out because they didn’t fit the evolutionary paradigm and would have made the whole idea of chimps evolving into humans completely impossible.

He is, as I said, insinuating conspiracy. And quite the conspiracy it must be, if the facts are in the open yet “seldom discussed”:

It was initially noted by another group of evolutionary scientists that when comparing random chimp genomic sequence only “about two thirds could be unambiguously aligned to DNA sequences in humans.” In confirmation of this widely known, but seldom discussed, inconvenient fact among those evolutionists working in the field was a comprehensive study published in 2013 by this author.³ In that research, I compared each individual chimpanzee chromosome to human (piece-by-piece) and it was shown that the chimpanzee genome was only 70% similar on average to human, with only short regions being highly similar.

As you can see, he thinks his 70% paper is relevant here. It isn’t.

26 thoughts on “Tomkins’ 70%”

Aceofspades25 says:

26/06/2014 at 3:18 am

I have a lot to say about Jeffrey Tomkins’ estimates.

A few months ago I wrote a critique on onme of his “papers”. Here is part 1 of the critique:

My first reply to Jeffrey Tomkins
byu/Aceofspades25 inNaturalTheology

He made plenty of other errors, but in this first part I focus on his calculations regarding the differences between humans and chimpanzees.

His paper was focused on the GULO pseudogene (28,800bp long) and he made the claim that for this sequence, humans are only 84% identical to chimpanzees and more similar (87%) to gorillas.

Anybody can download the sequences and count the differences for themselves so I did this and I have provided instructions for others to do this as well.

It turns out that these sequences are very easy to align. There are actually 519 SNPs and 41 indels giving a total of 580 differences between humans and chimps. Some simple math later and we find that they are 98% identical (not 84%).

Being confused how Jeffrey came up with such a low similarity, I decided to multiply each indel by its length and add that to the mutation count (much like you suggest he does with your figure on the right). Even when I did this, I still only got a total mutation count of 903, making them 96.83% similar (still nowhere near his 84%).

At this point I was left wondering if he had just made up his figure of 84% and so had effectively lied about it. I am still wondering whether that was the case.

His defense was that his algorithm “optimised sequence slices” had produced the figure, he insisted that it was right and he told me that I was doing “an amateur armchair analysis”.

He wrote a quick dismissal of my critique here (in the comments): http://www.uncommondescent.com/human-evolution/evolutionary-convergence-saves-creationist-hypothesis-over-gulo/

The thing is… any idiot can download the sequences and just count the differences for themselves. If these sequences were only 84% identical then this would imply that his algorithm had found an astounding 4490 mutations, over 7x the actual mutation count!

Even when assuming that he has counted each indel multiple times according to how long they are, we still don’t get anywhere close to the figures he claims to have come up with.

So I wouldn’t trust his algorithm or anything he says about the similarity between humans and chimps. Either his algorithm is seriously flawed or he is fudging his data.

Reply
- eyeonicr says:
  
  26/06/2014 at 10:55 pm
  
  “Amateur armchair analysis” is basically a description of every YEC paper ever, so I don’t know what he’s complaining about. 🙂
  
  I think what he did was cut up the chimp version of the gene into his slices, and then determined that 84% of them aligned with greater than a certain confidence (probably 95% or 98%, or similar), and not directly examine SNPs at all. But it sounds like the remaining 16% would align at 90% or so. Point is, his methodology probably isn’t flawed so to speak, but his result isn’t exactly useful or relevant either.
  
  That’s what I think, anyway. After the exam on Saturday I’ll see if I can go over his paper’s with a fine-toothed comb to see what he really did and come back to you.
- roohif says:
  
  16/01/2015 at 2:39 pm
  
  I’ve figure out how he gets 70% on his Comprehensive Analysis paper, and I assume the same problem affects the GULO one.
  
  In summary, he is using a version of BLAST+ with a bug in it that reduces the number of hits that are returned. Say for example, you submit 10,000 queries, one a time, you will generally get 10,000 results. What happened in Tomkins’ paper was that he submitted the 10,000 queries in one go, and only 7,000 or so results were returned. The bug exists in v2.2.27, and was fixed in 2.2.29.
  
  I’ve written a paper on this and submitted it to Answers in Genesis’ journal. Obviously I’m coming up against a lot of resistance, but I’m grinding him down slowly, on every point that he has made.
Aceofspades25 says:

27/06/2014 at 2:11 am

Okay, that might explain it (although difficult to imagine when these sequences are 98% identical)

I would call his methodology pretty badly flawed if it produced a result of 84% when the actual result was 98%. It’s a huge difference when comparing the difference in implied mutation counts.

It would be interesting to see your results, thanks for the offer. Be aware though that there are two fairly significant portions of the chimp genome that are missing from this 28,800bp sequence. Tomkins claims that his algorithm accounts for these – I don’t know how that would be possible if it works as you describe.

Reply
- eyeonicr says:
  
  27/06/2014 at 7:13 pm
  
  If he’s doing what I think he is it’s analogous to him being asked to calculate the average percentage score of a class in a test, but instead reported the percentage who passed. What I mean is that he hasn’t done the calculation wrong so much as it’s the wrong calculation to do. We’ll see if I’m right about that.
- Steve says:
  
  13/10/2015 at 1:09 am
  
  Hey roohif…
  
  So it looks like Thompkins has finally acknowledged the bug in BLAST and has reworked all of his results. Unfortunately, while a little better (88%), they are still far from reflecting the reality of the situation (98%).
  
  I provide a review of his most recent paper here: https://www.reddit.com/r/junkscience/comments/3ofwf8/human_chimp_similarity_take_2/
- roohif says:
  
  13/10/2015 at 8:45 am
  
  Hey!
  
  Yup, he has acknowledged the bug, and has quietly admitted his error. I say quietly because who on earth is going to read a paper with the title: “Documented Anomaly in Recent Versions of the BLASTN Algorithm and a Complete Reanalysis of Chimpanzee and Human Genome-Wide DNA Similarity Using Nucmer and LASTZ”?
  
  I submitted my paper way back in September 2014, and it was only _AFTER_ Tomkins’ response was published that the editor, Andrew Snelling, told me that my paper did not pass peer review. I quote: “I was courteous enough to send you his paper. I am under NO obligation to answer any or all of your questions”.
  
  As for his new paper, he is still using the “ungapped” parameter and if you read his paper you’ll see he uses it for completely pragmatic reasons, with utter disregard for what the parameter actually does. And he certainly knows what it does – because I’ve explained it to him both in my paper and over email. in short, if you have two sequences 100 base pairs long, and there is a single indel right in the middle of one of the sequences, then BLAST returns a match that is 100% identical, but only 50 base pairs long (because ungapped cannot continue the alignment through that indel/gap). Tomkins counts that as only a 50% match, even though the other half of the sequence is identical. Without ungapped, BLAST returns a match that is 101 base pairs long, but a 99% match.
  
  And that’s how he gets his 88% figure – ungapped produces shorter alignments, but he includes them in his calculation over the full length of the query sequence. Like I said, I’ve explained this to him at least twice, so by continuing to calculate it that way demonstrates a complete lack of integrity. And I guess that’s what I set out to do originally – show that these people don’t actually care about the true result.
  
  They are just “Lying for Jesus”.
Daren H says:

28/06/2014 at 4:16 am

The problem is the paper he cited does not even detail the methodology he used to derive the 70% value, only that he mined the sequences from NCBI. Is the algorithm he used reliable? Why didnt he use conventional algorithms? Did he count only non-coding DNA? As you mentioned above, did he count the deletion of 1000bp as one thousand deletions?

Its the ultimate hypocrisy: accuse other evolutionary biologists of deceit while deceitfully omitting information from your own research paper.

Fun idea: ask him to use the exact same procedural comparison between chimpanzees and gorillas. I suspect that he will come at a value even lower than 70%.

And they are both supposed to be part of the same baramin.

Reply
- Aceofspades25 says:
  
  28/06/2014 at 6:55 am
  
  Either way, he is horribly wrong and it is fairly easy to verify this using papers like this:
  
  Click to access human_GULO_pseudogene.pdf
  
  Where he makes claims about shorter sequences using his flawed methodology.
- Aceofspades25 says:
  
  13/10/2015 at 9:17 pm
  
  > As for his new paper, he is still using the “ungapped” parameter and if you read his paper you’ll see he uses it for completely pragmatic reasons, with utter disregard for what the parameter actually does. And he certainly knows what it does – because I’ve explained it to him both in my paper and over email. in short, if you have two sequences 100 base pairs long, and there is a single indel right in the middle of one of the sequences, then BLAST returns a match that is 100% identical, but only 50 base pairs long (because ungapped cannot continue the alignment through that indel/gap). Tomkins counts that as only a 50% match, even though the other half of the sequence is identical. Without ungapped, BLAST returns a match that is 101 base pairs long, but a 99% match.
  
  Thanks, that would seem to explain the remaining difference perfectly!
  
  I have a quick question though about how BLAST works. If I understand this correctly, It starts off by picking a “word” from the sequence you are looking for in order to “to nucleate regions of similarity”. This word length is 11bp by defeault
  
  Does it pick this word from the centre of your query and then once it has found potential matches, it expands out those matches in either direction until a series of misalignments is encountered?
  
  Or does it pick this word from the start of your query and so then only expand out the potential matches in 1 direction until a series of misalignments is encountered?
  
  I’d like to try and emulate what the -ungapped parameter would be doing with some of my sample sequences.
  
  Thanks
- roohif says:
  
  13/10/2015 at 9:48 pm
  
  Hey Ace,
  
  BLAST picks multiple words, and if configured to do so, it should search for them in multiple threads. Here is an actual demonstration of gapped vs ungapped behaviour:
  
  >cat qry.fasta
  >query
  CGCTACGCATCACACTGGAAGATGCTCGCATCCGATGCGTCAGTATGATCGGCATGAGC
  AGCATCGATCGATCGATCAGCTAGCTAGCTACGATAGCTAGCAGTGGTATCGATCGACGT
  >cat sbj.fasta
  >subject
  CGCTACGCATCACACTGGAAGATGCTCGCATCCGATGCGTCAGTATGATCGGCATGAGCT
  AGCATCGATCGATCGATCAGCTAGCTAGCTACGATAGCTAGCAGTGGTATCGATCGACGT
  >blastn -query qry.fasta -subject sbj.fasta -outfmt 10 -ungapped
  query,subject,100.00,60,0,0,60,119,61,120,1e-31, 116
  query,subject,100.00,59,0,0,1,59,1,59,6e-31, 114
  >blastn -query qry.fasta -subject sbj.fasta -outfmt 10
  query,subject,99.17,120,0,1,1,119,1,120,2e-61, 215
  
  You’ll see that I’ve just deleted the 60th character from the query, and tried to find a match in the subject sequence. In the first example – using ungapped – you get two results. Both are 100% matches, but they are only 60bp and 59bp long respectively. In both of Tomkins papers, and in my paper, we only take the first (and best) hit- which is the 60bp sequence. Tomkins will take this and say that because it was 119 base pairs long, and only 60 of them were a match, then that is 50.4% similarity.
  
  If you use gapped behaviour, you get one result, which matches 119 out of 120 nucleotides for a 99.17% identity.
  
  I wrote in my response paper that if you are using ungapped, you cannot then calculate the identity over the full length of the query, you can only calculate it over the length of the match. Anything else is disingenuous.
Borny says:

07/07/2014 at 10:51 pm

Really informative analyses that may escape the eye of the non-specialist. Expecting more such dissections of creationist arguments.

Reply
- Aceofspades25 says:
  
  13/10/2015 at 10:09 pm
  
  That makes perfect sense, thanks!
- roohif says:
  
  13/10/2015 at 10:13 pm
  
  Ha! You hit the wrong “Reply” button 🙂
- Aceofspades25 says:
  
  14/10/2015 at 2:57 am
  
  😛 Meh.. I think this comment system is broken
Eric says:

07/08/2014 at 5:26 am

In my opinion, your understanding is flawed. As a 2nd year college student, why should your opinion outweigh that of the research performed by a man who holds a PhD in genetics and has 56 publications in peer reviewed scientific journals and seven book chapters in scientific books? I would place much more validity with his conclusions than on your own opinion. May I encourage you to study science with more of an open mind?

Reply
Eric Lebs says:

08/08/2014 at 4:47 am

But you’re speaking as only a second year college student. That is compared to Dr. Tomkins who has a PhD in genetics & 56 publications in peer reviewed scientific journals and seven book chapters in scientific books. I think I’d rather give more credit to the conclusions of a scientist over a student.
Continue learning & studying. Later you may find that Dr. Tomkins was right.

Reply
- roohif says:
  
  17/09/2014 at 4:15 pm
  
  Hi Eric, I am a layperson with no PhD, no qualifications in the field of genetics and zero publications to my name. Nonetheless, I can demonstrate to you why Tomkins is wrong, and how he comes to the figure of 70%. I plan to submit my response to ARJ within a week. If you want a sneak peak at the draft, just send me a message. glenn at Delta, Uniform, Bravo, Zulu dot com dot au.
Joe Erwin says:

18/08/2014 at 12:39 am

The great thing about science as a process of obtaining and evaluating information is that it is a self-correcting process. As a scientist with a PhD and more than 100 publications, I am prepared to follow the objective evidence wherever it leads and revise my opinions on the basis of new and credible information. But opinions are opinions, and evidence is evidence–whether it comes from a scientist or a student or a layperson. Reliance on “authority” does not trump reliance on objective evidence. Let’s all keep our minds open to all the possibilities. Science should, in my opinion, be used as a method of finding out what is so, rather than as a method of generating propaganda for an ideology.

Reply
Sam Harris says:

16/09/2014 at 3:41 am

Hi Eric,
You are engaging in a logical fallacy, known as the appeal to authority. Tomkins may have 56 publications and a PhD., but his obvious goal is to prop up his belief in a literally true bible. That is more important to him than being honest. Like Tomkins, I have a PhD and peer-reviewed publications, but unlike Tomkins, my publications deal specifically with phylogenetic analyses which in turn relies, in part, on analyzing DNA sequence similarity. I know from some of his other ICR essays that he likes to employ search criteria that all but guarantee a lower-than-reasonable return. Are YOU familiar with the use of BLAST? If so, you will see that he uses search parameters that only return 100% matches for his ‘word length’ criterion. For example, if his ‘word length’ (DNA sequence length) is 10, and two 10-base DNA sequences from chimps and humans are identical for 9 of them, the search gives a 0% identity for them. In other words, Tomkins rigged his analysis to produce lower numbers.
I will take the conclusions of an honest and competent student over the religious ranting of a PhD any day.

Reply
- roohif says:
  
  17/09/2014 at 4:20 pm
  
  Hi Sam,
  
  I’m quite familiar with BLAST. The word length parameter really only finds 100% _INITIAL_ matches. So, if you use a word length of 7, you are likely to get a lot of “possible matches”. The BLAST algorithm then tries to extend the alignment in either direction. A word length of 10 is perfectly fine – I think the default is 11.
  
  Tomkins’ unknowingly encoutered a bug in the BLAST software that does not return a proportion of hits when a large number of queries are submitted at once. So, for example, if Tomkins submits 300,000 queries, he might only get hits for 200,000 of those. The bug, is that if you re-submit the remaining 100,000 queries, you will get more hits. Repeat and resubmit until you don’t have any queries left.
  
  I discovered this bug while I was writing my own response to Tomkins. I have been communicating with him over the last few months, and he is now aware of the problem, he just hasn’t spoken the magical words yet: “I was wrong” 😀
roohif says:

16/09/2014 at 8:25 pm

I’ve actually written a response to Tomkin’s 70% paper, and I’ll be submitting it to ARJ (Answers Research Journal) in the next week or two.

Quick summary: he has fallen victim to a bug in the BLAST+ software used to compare the sequences.

Reply
- Jacob says:
  
  03/11/2014 at 9:45 am
  
  Could you send us a link to that response? Also I think I just fell victim to that bug myself.
- roohif says:
  
  03/11/2014 at 10:40 am
  
  It will hopefully be published in ARJ in the next month or so. Keep an eye out for it 🙂
- roohif says:
  
  03/11/2014 at 10:57 am
  
  You can email me – glenn at dubz dot com dot au
Pingback: Chimp and Human DNA * The New World