in ,

Assumptions and Expectations versus Reality: non-coding RNA

[Originally published in 2014 as Geneticists’ Bias Causes a Big Mistake]

You’ve heard it many times before: the vast majority of DNA is junk. Of course, the ENCODE project showed how wrong that notion is. Now that we know the vast majority of DNA is functional, you might wonder how the idea of “junk DNA” became so popular among scientists. I suspect there are many reasons, but some recent research has revealed one of them — a bias regarding what it means for DNA to be functional. The research was done on molecules called long non-coding RNAs, which are commonly referred to as lncRNAs.

What are lncRNAs? Well, let’s start with what RNA is. The genes that your body uses are in your DNA, most of which is found in the control center of the cell, called the nucleus. In order for your cells to use those genes, they must be copied by another molecule. This process is called transcription, and the molecule that performs transcription is RNA. Once it has transcribed the gene, RNA leaves the nucleus, at which point it is often referred to as messenger RNA (mRNA) because it is sending a message to the cell.

Advertisement Below:

What’s the message? It is a recipe for building a protein.

That recipe is put together in informational units called codons, and it goes to a ribosome, which is a protein-making factory in the cell. The ribosome reads the codons, translating them one by one into a protein. Not surprisingly, this process is called translation.

This is one way to visualize a coding section of RNA. It has a start codon that tells the cell to start making a protein, followed by a recipe for that protein. Then there is a stop codon, to tell the cell that it is done making the protein.
  • How does the ribosome know when to start building the protein? There is a start codon that tells it to start.
  • How does it know when to stop building the protein? There is a stop codon that tells it to stop.

As a result, you can think of messenger RNA in terms of the illustration above — it contains a start codon, a recipe for a protein (the blue bar in the illustration), and a stop codon.

So, how does this relate to lncRNAs? Well, messenger RNA is referred to as “coding RNA,” because it codes for the production of proteins. LncRNAs are called “non-coding RNAs” because it was thought that they do not code for proteins. Now, there are lots of RNAs that are thought to be non-coding, but lncRNAs are relatively long. That’s how they get their name. Well, it turns out that at least for some lncRNAs, every part of their name (except RNA) is wrong.

In February 2014, Dr. Alexander Schier and his colleagues were looking at the development of zebrafish embryos. They analyzed the proteins that were present during the process, and they found several that had not been previously identified. One of them was a small protein that had exactly the sequence one would expect if it had been coded for by a lncRNA that has been called Toddler.

In their study, they produced six lines of evidence that this small protein is made by the cell using Toddler.1 In other words, even though Toddler is known as a long non-coding RNA, it actually does code for a short protein, and during the development of the zebrafish embryo, the cell translates it into a short protein. (As a point of terminology, a short protein is often called a peptide.)

How common is this?

Advertisement Below:

Another study has identified hundreds of short proteins (peptides) in both zebrafish and humans that are produced from what were thought to be long non-coding RNAs. How did they identify them? They identified the signs of ribosomes reading the lncRNAs. In other words, it’s not just that there are proteins that have the sequences you would expect from lncRNAs. The research team actually showed that those lncRNAs went to a ribosome and got translated!2

So…if there really are hundreds of lncRNAs that actually do code for proteins, why did geneticists call them non-coding?

When I first started reading about them, I assumed that geneticists thought they were non-coding because they didn’t have the structure of messenger RNA, as shown in the illustration above. Maybe they didn’t have a start codon. Maybe they didn’t have a stop codon. I just assumed that something must be missing from them. Why else would they be called non-coding?

However, with help from a professor who has forgotten more molecular genetics than I will ever learn, I found out that lncRNAs have exactly the structure you find in messenger RNA. They have a start codon, a recipe for a short protein, and a stop codon. Why, then, were they called non-coding? Because the recipe is really short.

Geneticists just assumed that recipes for short proteins just don’t get translated, so RNAs that contain short recipes were thought to be non-coding. Now, of course, there’s more to it than that. Geneticists also tend to compare the RNAs they find in cells to known proteins, and there aren’t that many known short proteins in living organisms.

So, in the end, we now know that geneticists’ bias towards large proteins led to a big mistake. At least hundreds of long non-coding RNAs are, in fact, short coding RNAs! It will be interesting to see how this line of research progresses. If you look at all the RNAs that are produced in animals and people, you find a bewildering number that have start codons, stop codons, and a recipe for a short protein. I suspect that as time goes on, we will find an equally bewildering number of short proteins that are produced by those RNAs, at least at specific stages in the life of most organisms.


  1. Pauli A, Norris ML, Valen E, Chew GL, Gagnon JA, Zimmerman S, Mitchell A, Ma J, Dubrulle J, Reyon D, Tsai SQ, Joung JK, Saghatelian A, and Schier AF, “Toddler: an embryonic signal that promotes cell movement via Apelin receptors,” Science 2014, doi:10.1126/science.1248636
  2. Bazzini AA, Johnstone TG, Christiano R, Mackowiak SD, Obermayer B, Fleming ES, Vejnar CE, Lee MT, Rajewsky N, Walther TC, and Giraldez AJ, “Identification of small ORFs in vertebrates using ribosome footprinting and evolutionary conservation,” EMBO Journal33(9):981-93, 2014

Advertisement Below:
Dr. Jay Wile

Written by Jay Wile

As a scientist, it is hard for me to fathom anyone who has scientific training and does not believe in God. Indeed, it was science that brought me not only to a belief in God, but also to faith in Christianity. I have an earned Ph.D. from the University of Rochester in nuclear chemistry and a B.S. in chemistry from the same institution.

Advertisement Below:


Leave a Reply

Your email address will not be published. Required fields are marked *


Advertisement Below:
Advertisement Below:
Finding Peace video still

Rest and Satisfaction is Found in God

Decoding fossilization video still

How to Make a Fossil