The doorstop fallacy

Here is a short parable for you:

Once upon a time a researcher walked into the office of one of his colleagues. He was amazed to see that his colleague’s door was propped open by a stone. He then exclaimed, “I finally figured out the function of all these darn stones. They are all doorstops!”

This parable mirrors the logic that can be found in a great number of reviews on lncRNAs.

So why do so many researchers fall into this trap?

I suspect that it is a mixture of different issues. Here are a few:

1) Researchers want to magnify the importance of their work. As scientists, we make our dime in part by questioning widely held assumptions with the hope of providing new insight. If you happen to be studying lncRNAs, and you provide evidence for a new functional lncRNA, then spouting a generalization (“all non-coding DNA was once thought to be useless; now our work shows that this is bunk!”) is good for highlighting how your work questions “dogma” and “antiquated ways of thinking”. Your new shiny lncRNA is an example of how wrongheaded the old ways of thinking are. Your work, apparently, is blazing a new way forward. This line of reasoning is a great way to show those holding the funds that more money needs to come your way. This line of reasoning is a great way to boost the importance of your study (a must, if you want to publish your work in all the highly indexed glam journals).

2) Researchers have a tendency to classify items into “types” with each type having a set number of characteristics. tRNAs are synthesized by RNA Pol III, processed by RNaseP, and have a role in protein synthesis. So what about lncRNAs? This is a big mistake. The problem lies in the fact that we unfortunately gave some name to a whole group of molecules that have a diverse set of features. Many researchers then go on to make the assumption that knowledge gained from one lncRNA will automatically be applicable to many (or even all) others. However, there is no reason to think that lncRNAs should all share any general characteristics. In fact the name lncRNAs, is itself part of the problem. What is a lncRNA? It is a sort of leftover class of molecules that does not belong to any other group. Unlike other classes of RNAs, functionality does not appear to be a prerequisite to be part of this group. We don’t even know how many lncRNAs exist. We are not even close to having a coherent list of them.

Every once in awhile I will read some comment made by colleagues who fall into this trap “your reporter RNA codes for protein and thus is not a lncRNA, so how could you learn anything about how lncRNAs are exported from the nucleus to the cytoplasm using this reporter?” This makes the assumption that the cell, just like the poor scientist, categorizes RNAs into mRNAs and lncRNAs. It does not. The only thing special about an mRNA is that it is translated into a protein, but if you take away ribosomes (a situation that occurs in the nucleus), how would the cell know which is an mRNA and which is a lncRNA? By consulting lncipedia? Nonsense. The list of LncRNAs is ever-changing (depending on whom you consult and when you consult them) and counts among its members a mixture of oddball functional non-coding RNAs and many junk transcripts.

3) Researchers charting new territory mistake something for nothing. Our genomes are junk piles. But what does that mean? Imagine yourself as a treasure hunter who was visiting your local dump yard - you would find that it was filled with the detritus of every day life. Broken tricycles, plastic bags, twisted wires, a fridge door etc. And within this heap you will find a few gems. For example, a pristine set of binoculars that were thrown away in their store-wrapped package. Being a professional scavenger, you are elated to find such a precious item. And as the saying goes, one person’s junk is another person’s treasure. You may even venture to think that deep within the mounds of trash, there are other hidden gems. But would you be confused if I told you that although there may be a large number of useful items hidden out there, most of what you are looking at is junk? Likely not. But in the genome this distinction is not so clear. It’s all As, Ts, Gs, and Cs, all the way down. The fine line between junk and function is hard to see. Biological function is a slippery idea, and if you are not careful you might see function everywhere you look. This is sometimes known as apophenia. But there are guidelines that have been developed over many years of debate over the nature of biological function. Equally important are the developments in molecular evolution, many of them coming out of neutral theory. If researchers are not acquainted with neutral theory (i.e., they think that evolution = natural selection), then why would they think that any stretch of As, Ts, Gs, and Cs could be junk? Being ignorant of these concepts and ideas, they are the treasure hunters in a junk yard with no idea about how the world works. No idea of consumerism. No idea why we would “tolerate” all this useless stuff that is just taking up space. Forget about junk yards; such a person would be confused if they were presented with a field of stones, each one a potential doorstop.

4) Researchers have tools that they can use mindlessly. One of the pleasures of being a molecular biologist at this point in history, is the awesome sequencing tools that we have at our disposal. I remember a time when getting nucleic acid sequencing data was painful. In fact, my first lab job as an undergrad was to run Sanger DNA Sequencing Gels (with 32P!!!*). But today it is easy and painless to amass mountains of sequencing data. With this awesome power we can easily identify a transcript that is present at a level of one copy per thousand cells. And what do we do with all this data? It is easy to just publish and not think too hard about what it all means.

Making this worse are monstrous Big Science endeavours such as ENCODE, where 442 researchers can consume 288 million dollars, and then justify their own existence by over-interpreting their data and making a big splash. But was it worth it? Today, how many of us are using this ENCyclopedia Of Dna Elements to gather new insights into human biology? Is ENCODE simply another The Library of Babel?

5) Researchers assume that existence implies function. This is probably the most deeply held assumption that (to my knowledge) has never been addressed head on. Most researchers have some intuitive sense that in the absence of purging selection, things decay. But what do we mean by that? Intuitively we have the thought that over long stretches of time junkyards will eventually disappear. All of the broken tricycles and plastic bags will be crushed, broken down and scattered to the four corners of the earth by rivers, windstorm, hurricanes and other agents of entropy. They may even think that the junk is using up precious space, and that sooner or later, greedy developers will bulldoze these junk heaps and start building prefab houses to sell to desperate wannabe homeowners. I hear this often when I speak to the uninitiated: “why would the genome tolerate all this wasted space?” This all boils down to the assumption that if something exists, it must have a function. So why would a lncRNA be made, yet essentially be junk. And if it isn’t functional why is it there (i.e. why hasn’t it “decayed”)? Let’s break that down to two issues. First, how are they being generated through evolution. Second, why aren’t they being eliminated through purging selection.

Non-functional “lnc” RNAs are a product of our messy gene expression machinery. RNA polymerase, like any other enzyme, will bind to suboptimal substrates and initiate transcription. Cryptic promoters will be created and destroyed by neutral evolution. It is surprising how many stretches of essentially random DNA have promoter activity. So we should expect that at any given time, there will be a large number of transcriptionaly active sites producing junk.

As for the substrate DNA that is transcribed into these non-functional lncRNAs, this would be the massive amount of junk DNA that is found within our genome. This extra detruce is created by the activity of transposable elements - selfish bits of DNA whose only goal is to reproduce themselves inside your genome. And these bits of DNA do decay, but here’s the catch, junk DNA accumulates base substitutions rather than erode away as our imaginary junkyard pile. So the letters tend to scramble rather than disappear. This is an example where our language misleads us.

But this junk is taking up space, you might protest. Well here is the thing. These changes in genome size are likely not big enough to be acted on by selection. Again, the issue of “space” is likely dictated by neutral evolution.

In this case the analogy between lncRNAs and stones is more illuminating than the comparison to junk in a dump heap. Stones are constantly made and eroded away by known processes. Would you so confidently assert that a stone’s existence necessitates that it has a “function”? Does the presence of a stone necessitate that some lonely door be present out in the world waiting for its long lost doorstop?

Summary

Researchers, tend to over-interpret their results. We now have the power to measure extremely rare events. When we do detect a real signal we extrapolate widely. Every lncRNA apparently teaches us about all that mysterious dark matter. All this extra DNA is supposedly there for a reason. Every discovery, we are told, is a paradigm shift. We are apparently in constant need of rewriting those darn textbooks.

Baloney!

The sad thing is that how evolution is operating and how functional units are popping in and out of existence is so much more interesting than all this uninformed mindless over-interpreted drivel that is ubiquitously presented in countless lncRNA papers and reviews. Will we ever transcend all this simple mindlessness? Eventually … I hope.

P.S. I highly recommend this upcoming paper (now available in an advance online publication) on lncRNAs and functionality from Chris Ponting and Wilfried Haerty.

* As an aside, I chuckle to think that as a lonely undergrad, I was working daily with P32, while today most of my colleagues are afraid to use any radioactive compounds in the lab.