[Originally published as Are all animal species really the same age?]
Everybody’s talking about a news reports that all animal species are basically the same age. I’ve seen it on my social media feed. I’ve gotten questions about it by email. Creationists really want to know what this means. So let’s dive in. (I should warn you to brace yourself. This is going to be long and pretty technical. Scroll down to the end for what I would tell the average person about these headlines.)
The research paper comes to us from the journal Human Evolution by authors Stoeckle and Thaler. It describes the results of something called DNA barcoding, which started out as a means of classifying species using a small bit of DNA. For animals, this DNA is a 600 bp region of the mitochondrial COI gene. As you might imagine, barcoding wasn’t really popular among taxonomists when it was proposed. Barcoding was basically the young startup that thought it could replace years of expert study with a simple DNA sequence. At least that was how it was perceived in some corners. There was also concern that COI sequences are not representative of species as a whole, since they’re just a tiny, tiny part of the whole genome.
Stoeckle and Thaler present the results of their study of millions of these DNA barcodes from thousands of species. The difficulty I had with their paper is that they don’t present much of their data.
Nevertheless, based on their knowledge of the field, they say that the vast majority of DNA barcodes are extremely similar within species and more different when comparing different species. For example, if you sample 100 different rattlesnakes, the Eastern Diamondbacks would be very similar to each other, but distinctly different from Western Diamondbacks. These species that can be recognized with DNA barcodes are overwhelmingly exactly the same as the species recognized by a consensus of experts.
That’s true for birds, mammals, reptiles, insects, and common animals, but it’s a bit more difficult with unpopular animals like worms. No, I’m not joking. That was a real result. The popular groups that people like to study have a roughly 94% match between the DNA barcode clusters and the expert-recognized species. With unpopular groups that few people study, there’s only about a 70% match between the DNA barcode clusters and the recognized species. That’s probably just neglect though. Unpopular species are unpopular.
The next major pattern that Stoeckle and Thaler point out is the surprising result that most species have roughly the same amount of variability between different members of the species. The average is 0.2% difference, meaning that when you compare any two members of the same species, they will differ only by about one nucleotide out of the 600 in their DNA barcode.
From there, the authors make an analogy with results in humans
They claim that the mitochondrial DNA diversity of human beings, which in this region of the mitochondrial DNA is about 0.1%, is the result of a recent demographic event. In other words, they say that all humans got their mitochondrial DNA from a small population (that they claim could be as few as a founding pair) that lived only 100,000 to 200,000 years ago.
Therefore, by analogy, since most other species have similarly low diversity in their DNA barcodes, they too must have originated from a small population (possibly even a founding pair) that lived less than a quarter million years ago. In other words, all species are about the same age.
I think you can see why this conclusion is attractive to creationists.
It sounds like confirmation of what creationists believe about the origin of created kinds: a single population diverged in the recent past and all at the same time. So if you traced the ancestry of elephants, rattlesnakes, dung beetles, or people, they would all originate at the same time. So this research sounds good, right?
Well… I’m not entirely sure what to think here, but I have some concerns
First of all, let’s recognize that this research, if correct on its face, would make a mess of created kinds. This article claims that all species are roughly the same age, and creationist work over the past twenty years has generally agreed that the created kind (from which species descended) contains more than one species. A few created kinds contain many species. So this research would imply that the created kind is actually the same as the species, if we interpret it on the face. I think that’s possible, but it would require a big adjustment to creationist thinking (for example, most of the Ark Encounter exhibits, with their imaginative created kind sculptures, would have to be ripped out and re-done).
Why might this not mean what it seems to mean? The first thing that crossed my mind here is that the authors are essentially comparing apples and oranges. Differences in DNA barcodes within a species are not the same thing as differences in DNA barcodes between species. How could that be?
Let’s start by thinking about how differences arise in the first place: There’s a mutation. Mutation is just the beginning though. Remember that mitochondria are inherited only through the female line. That means that any mutations that happen in a male animal are lost. Those mutations don’t count.
Let’s pretend though that our hypothetical mutation has happened in a female. Now that female has one mitochondrion that has a different DNA barcode than all of its other mitochondria. Oh yes, each animal has many more mitochondria than cells. What you need is a mutation that happens in a egg cell, so that it can be passed on to the next generation. But still that’s not good enough. In human beings, egg cells can contain 100,000 to 600,000 mitochondria, each with its own copy of the DNA barcode. So that one mutant barcode has to increase in number. Ideally, it has to happen sometime early in the germ line, so that its numbers increase, which therefore increases the probability that it will be passed on to offspring.
OK, so let’s just assume that the mutant barcode happened in the germline early enough that an egg cell ended up with maybe 50% of these mutant mitochondria. Let’s also assume that that particular egg cell was fertilized and produced a female offspring. Once it’s in the egg cell, it will be passed along to ever single cell in the female offspring’s body. So that’s how the mutation gets into the population. And getting into the population is the important part – mutations aren’t really detected by barcode sequencers unless they are frequent enough in an organism to be sequenced.
They also have to be frequent enough in the population to be sampled.
What increases the frequency of a mutation in a population? Two possibilities: Selection and drift. Natural selection begins operating if the new sequence conveys some kind of advantage on the animal that other members of the same population don’t have. If my mutant butterfly are better at leaving more offspring than the other butterflies, their mutant barcode will increase in the population over time.
In reality, though, Stoeckle and Thaler report that the vast majority of the differences within species are synonymous, which they claim are not selectable. In other words, most DNA barcode differences observed between members of the same species don’t make the owners any better or worse at reproducing than any other sequence. Natural selection can’t work, they say.
And I’m willing to accept that, even though there are reasons to assume that natural selection can operate to a limited extent on synonymous mutations. Stoeckle and Thaler make an argument in their paper that those circumstances don’t really apply to mitochondria, so mitochondrial synonymous mutations are basically invisible to natural selection. I’m sure there’s probably more nuance to the argument than what they give, but I’m willing to go along with the idea of non-selectable mutations for the sake of argument.
Where does that leave us?
The only way that mutations within a species could increase in frequency enough to be detected is by drift – random fluctuations in the population. For example, let’s say our female butterfly with the mutant barcode just happens to lay all her eggs in a place where a forest fire destroys them before they hatch. That mutant barcode just “drifted” out of the population. Once the mother butterfly dies, the new mutant barcode dies with her. None of her offspring survive.
Alternatively, maybe there’s some kind of air pollution that interferes with butterfly development, reducing the number of butterfly babies. But our mutant butterfly happens to flap her way up a mountain and lay her 1,000 eggs in a place where there is very little pollution. So all her offspring hatch and grow up to be adult butterflies, when all the other butterflies in the same species only get about half of their eggs to hatch.
On the other hand, maybe there’s a terrible drought in one region, and half of the butterflies in the species die. Our mutant butterfly happens to be in a different region where the butterflies survive. That drought effectively doubles the frequency of the mutant barcode all by itself.
This sort of thing happens every generation, so that the population frequencies of mutations in DNA barcodes should fluctuate up and down. In some generations 20% of the individuals in the population might have the mutant barcode, but a few generations later, 30% might have it. A few generations after that, the frequency might drop to 27%. Generally speaking, though, the rule of thumb is that a mutant with a higher frequency is probably older than a mutant with a lower frequency. That makes sense.
If you start out with a mutation in a single individual, it will take many generations before the offspring of that one individual become numerous enough to account for a large fraction of the population.
Why is this important?
Because if you find a species where half of the members have one barcode sequence, 30% have a different barcode, and 20% have still another barcode, you can conclude that that species must be fairly old. That doesn’t always work (because of something called bottlenecking), but it should average out over the thousands of species in the barcode database.
On the other hand, if you find a population where 95% have just one barcode sequence and the other 5% have a different sequence, that’s a very different scenario. That really does seem more like the species is quite young because almost all of the individuals have the same barcode sequence.
So how frequent are these DNA barcodes within each species? Can we estimate how old they are?
That’s the trick, though. Stoeckle and Thaler didn’t report that, and I don’t think the DNA barcode database is set up to answer that question yet. The barcode database currently just samples specimens. According to Stoeckle and Thaler, they’ve sampled 5,000,000 individual specimens from 100,000 animal species, which averages out to about 50 specimens per species. Is that enough to estimate the population frequencies of barcode mutants? Consider the case of the white-tailed deer in the United States. There are an estimated 30 million white-tailed deer. There’s no way that 50 individuals represent the entire population of white-tailed deer. Other species aren’t so numerous, but 50 is still a pretty small sample of a species. There are thousands of human sequences that have been studied to estimate population frequencies.
Now it’s possible that some species have been sampled with this very thought in mind – trying to estimate the frequencies of DNA barcodes. Given the averages, though, it seems more likely that the average species in the barcode database has not been systematically examined for population frequencies of individual barcodes. And that’s not great news because when you have a small sample of a population, you are much more likely to sample high frequency barcodes. If there’s a population of 30 million deer, and you sample 50 of them, how likely is it that you will sample a mutant barcode with a frequency of only 10%? Much less likely than discovering barcodes with higher population frequencies, which imply that they are older.
So how can Stoeckle and Thaler claim that all of these species originated at the same time?
How can they say that the different barcodes within a species are young? Only by analogy. They look at what has been done with human mitochondrial DNA, and they conclude by analogy that other animals must be just like them since all animal species have roughly the same number of barcode differences within the species. At least in this study, they haven’t done anything to deliberately estimate the age of each species. They just say that the human situation probably applies to everything else.
That’s not a great argument by itself.
They could be right, of course, but this paper doesn’t really show that. It basically presents what Stoeckle and Thaler consider to be the most likely hypothesis, but as far as I can tell, there’s no actual test of the hypothesis. How could they be right if there are lots of high-frequency barcode mutants? There could have been a bottleneck: a reduction of the species population to a small number (say 1,000 or 10,000 individuals) that preserved mutant frequencies from before the bottleneck. That way, the species population really could be young, but still have high frequency barcode mutants. Remember, though, they haven’t shown that the bottleneck is the correct explanation. They just think it is (and reasonably so).
Now what about the differences between species? I said that those aren’t the same as looking at differences within species. Why not? Differences between species tend to be fixed. In other words, each member of each species has the same barcode nucleotide, and they happen to be different. Let’s say, for the sake of argument, that all white-tailed deer have an adenine at the tenth position in the barcode sequence, but all mule deer have a guanine at the same position.
The species are different, but within each species the population frequency is 100%. These are fixed differences. The cumulative effect of fixed differences is what make species different from each other. Why do white-tailed deer raise their tails to run but mule deer don’t? Fixed differences. Why do tigers have stripes and lions have manes? Fixed differences. Why do zebras have stripes and horses don’t? You get the idea.
Fixing a difference between a species, spreading a mutant barcode to 100% of the population, takes time. Or natural selection. It could happen with a bottleneck as well, but generally speaking age is the go-to hypothesis for fixed differences that are not selected. There are other evidences of bottlenecks that would have to be demonstrated to conclude that bottlenecks were to blame.
So a mutant barcode within a species is probably pretty young, on average, but a fixed difference between species is older, possibly much older. How much older? Depends on the number of fixed differences. The more fixed differences, the older the ancestor of those two species. What Stoeckle and Thaler want to explain is what they see as a disconnect: Why are there so many fixed differences between very similar species (implying that the species originated a long time ago) but so few differences within species (implying that the species are much younger)?
They propose that modern animal species have undergone a common population bottleneck, possibly related to the Ice Age.
But there’s another explanation that they even mentioned in their paper: extinction. Speaking from an evolutionary standpoint, there are other lineages with other DNA barcodes that connect the species together with their common ancestor, but those barcodes are now extinct and mostly inaccessible. For example, consider horses. There has been ancient DNA extracted from two groups of extinct horses: The genus Hippidion is a group of extinct South American horses, and the New World Stilt-Legged (NWSL) horses are also known from the fossil record of North America (see also this paper). As you might imagine, the NWSL have long legs. Based on their mitochondrial DNA (from which we get barcodes), we find the NWSL horses fill in the gap between domesticated horses and all other horses. The Hippidion sequences bridge the gap between all modern equids and the rhinos.
Stoeckle and Thaler claim that the species have discrete barcodes and that there are no intermediates between the species, but that isn’t quite right. There are intermediates, and they’re known from the fossil record.
This is my biggest concern with their study: Mitochondrial DNA (barcodes) can be rapidly lost from a population due to drift. Lineages randomly go extinct and they do so at a high rate. Every time a female with a unique barcode fails to produce offspring or produces only sons, that mitochondrial line is gone. Is it really all that surprising then that within a species most barcodes are very similar? I don’t know. Maybe. Maybe not. Stoeckle and Thaler don’t really do calculations or simulations to convince the reader that the result that they observe is extraordinary and worthy of an extraordinary explanation (like a common bottleneck for all animal species on earth).
This actually matters quite a lot more than you might expect
Stoeckle and Thaler want to argue based on analogy with humans, but that may be deceptive. The common mitochondrial ancestor for modern humans is thought to be between 100,000 and 200,000 years old, but when you examine the rare variations found in the nuclear genome (which we inherit half from our mothers and half from our fathers), the average age is one to five million years old. So if you only look at the mitochondrial barcode, our species looks extremely young, but if you count the nuclear genome, we could be ten to fifty times older! Why do we all have such similar mitochondrial DNA? There’s been a high turnover of mitochondrial DNA due to extinction. I’m not convinced there’s any other special explanation required.
(Incidentally, this is also why I find the whole population size of “Adam” debate to be moot, since they’re looking for Adam at the origin of the mitochondrial ancestor, which probably has nothing to do with the origin of humanity.)
So, what can we say about Stoeckle and Thaler? Here’s a few talking points.
- They have some interesting data and a lot of it, but their paper argues from analogy and not analysis. They do not show by calculation or simulation that the barcodes are of recent origin at the same time.
- Nevertheless, their data is consistent with their hypothesis, which could be very curious. I hope someone follows up on this research.
- There are other explanations for what they’ve found, though, and I’m not sure they have enough of the quality of data they need to show that their hypothesis is correct.
- There could be nothing more to this than the high rate of mitochondrial DNA extinction. We need to look more closely at the diversity of nuclear genomes before we can really say for sure.
Read all about it. It’s open access.
Stoeckle and Thaler. 2018. Why should mitochondria define species? Human Evolution 33(1-2):1-30.