How Assembly Theory can help us understand the Metabolism-First Theory of the origin of life

So a few weeks ago, I wrote a semi-critical post of Assembly Theory. There have been other criticisms leveled at Assembly Theory, some valid and some a little overblown. Despite all this, I do believe that there may be some use for Assembly Theory in further understanding and developing the Metabolism-First Theory of the origin of life. This post is dedicated to exploring these ideas.

The origin of life, a paradox

One of the most enduring mysteries in all of science is how did life begin in an abiotic world. Since all known life consists of cells, we can think about what makes cells different from non-living objects. Many thinkers have observed that there are at least two* important aspects of cellular life: metabolism and the ability to transmit biological information from one generation to the next.

Metabolism is a way that our cells harvest and siphon energy and redox potential from the environment to build up new macromolecules. This allow the cell to not only replace its own macromolecules, which breakdown over time, but also to grow, divide and even secrete compounds and alter the cell’s neighbourhood. In multicellular organisms, secreted compounds help to coordinate cell-cell communication, build tissues and maintain organismal homeostasis.  

Metabolism depends on the production and maintenance of a plethora of enzymes and cellular compartments, especially the membranes that surround them. We tend to overlook this second aspect of metabolism, but as Peter Mitchell showed us, life uses membranes to create energy potentials (sometimes referred to as chemiosmotic gradients) that fuel most metabolic processes. In turn, life uses other metabolic processes to maintain these gradients, typically, by pumping protons across a membrane. Thus, these membranes effectively serve as the cell’s battery supply, and are central to most of the core metabolic pathways found in our cells.

Beyond membranes, we also have enzymes. These convert small molecules into macromolecules and into long polymers, some of which even “stores” genetic information on how to make the enzymes. In this way a metabolic system fuels the spreading of information on how to generate enzymes which perpetuates the copying of metabolic systems. This is one way to look at what life is.

Speaking of genetic information, it is the second core feature of cellular life that distinguishes it from abiotic objects. This information is stored in the sequence of nucleic acids in DNA and RNA, which in turn can be used to make these enzymes. Most enzymes are proteins consisting of a linear arrangement of amino acid sequences. Enzymes direct how DNA information is copied, transcribed from DNA to RNA, and translated from RNA to proteins. In this view information generates enzymes that perpetuate the creation and spreading of more information.

These two views of life (metabolism and information) set up a paradox – you need the information molecules to specify how to make the protein enzymes, which run all of our metabolic pathways, and at the same time you need the protein enzymes to copy and use the information stored in the information molecules. So if one came first, which would it be? This is the proverbial “chicken and egg” scenario**.

The RNA World Hypothesis

A potential resolution to this paradox came with the realization that cells contained not only protein-based enzymes but also RNA-based enzymes (often referred to as ribozymes). Moreover, many of these ribozymes were the most ancient and conserved parts of our cellular machinery. Indeed, if you made a list of universally conserved genes, these would be genes found in every genome ever sequenced, you would come up with a surprisingly short list of less than a hundred genes. Of these, about a large fraction are genes are transcribed into functional non-coding RNAs. This includes ribosomal RNA, and tRNAs two classes of molecules that play pivotal roles in converting nucleic acid sequences, stored in DNA and RNA, into amino acid sequences found in proteins. Besides tRNAs and rRNAs, many of the other universal genes encode proteins that are part of the ribosome or function in protein synthesis. Thus it appears that the most ancient machinery in our cells is the machinery used to convert nucleic acid information into proteins. So at its core, our cells seems like the product of a bootstrapping exercise, where much of the conserved machinery looks like an RNA-beast making proteins. 

With the realization that RNA can act as a molecule that has enzymatic activity and information-storing potential, a new idea began to take shape – that RNA was both chicken and egg. And this is where the idea of the “RNA World” hypothesis comes from. This all begs the question of what was the original molecule that kick started the RNA World – a molecule that could carry information and at the same time act as an enzyme to replicate this information. In other words, an RNA polymerase ribozyme. In a snarky way, some have referred to this hypothetical entity as the “God Molecule”, I guess because if a God existed, all they would have to do is create this molecule and it would launch Darwinian evolution all by itself.

Is RNA the answer?  

Of course the idea of a God Molecule raises new problems. Can we identify a ribozyme with RNA polymerase activity? If so, how simple can it be? How did the RNA polymerase ribozyme come into existence? Where did the precursors (i.e. the nucleic acid monomers) come from?

I don’t want to delve too deeply into these problems. All I want to tell you is that many of these do not look too promising. Groups like the Szostak lab have tried to evolve RNA polymerase ribozymes from scratch with some success. However the enzymes look very complicated and not something that could have randomly come together by chance. On top of that, these RNA polymerase ribozymes cannot copy long stretches of RNA and tend to make many mistakes. As for where the building blocks (i.e. nucleic acid monomers) come from, this is also problematic. Just to take an example, ribose, a component of the nucleotide monomer, is unstable and thus unlikely to persist for long periods of time or build up to any appreciable level.

To solve all these issues, many in he field started to abandon the idea of a “God Molecule” and instead investigate other systems. Some groups started to look for non-biotic catalysts (like clay) which could promote the copying of RNA. Other groups looked at potential precursors to RNA. These would consist of RNA-like polymers where the ribose and phosphate groups were replaced with other moieties. Some are more reactive so that they could easily polymerize with a greater number of catalysts, or in a catalyst-free manner. Of course, it becomes unclear how you can build up large numbers of these building blocks if they are so reactive.

This discussion brings up one of the most serious problems with the RNA World hypothesis (and related ideas). These worlds could only take place in a chemical environment filled with large molecules containing so called “high-energy” bonds. I don’t want to complicate this further, but for those of you who have taken some chemistry, the formation of these bonds require a redox environment that is reduced. And the more you create reduced bonds, seen in compounds rich in carbon-carbon bonds, the more pro-oxidative molecules you make (like O2). This begs the question of how these large, reduced molecules came to be and how could they be present at high enough concentrations to allow for very reproducible reactions to take place. On top of that the building blocks must be somewhat uniform. When you build a new nucleic acid polymer, you want nice clean reactions with a set number of building blocks without any other block-like entities that could substitute for your blocks. Reactions are messy and if the right blocks are not incorporated then your polymers replicates will have block-like entities incorporated in them. At best these will only slightly compromise your ability to be reproduced, at worse these will kill any replication reaction. On top of all of this is the problem that you are constantly using up the good building blocks. If both good and bad building blocks are being made by some natural process, the bad block-like entities would build up over time as they would not be used up as fast as the good blocks.

Needless to say, for the RNA World (or other RNA-like Worlds) to come into existence, some sort of magic prebiotic soup had to be there in the first place. But was there such a thing as a prebiotic soup?

Metabolism-First, an alternative origin of life theory

As we discussed earlier, life can be viewed through the lens of biological information or through metabolism. Is it no surprise that if we try to generate biological information molecules before metabolism we quickly run into problems. In a way the idea that RNA was both chicken and egg was a bit naïve. If information molecules are the chickens, the eggs were really the metabolic system that sustains the ability to make RNA & DNA precursors and other high-energy molecules at high local concentrations.

So could metabolism have come first? This may explain where all the high energy molecules come from. But how could such a system arise without biological information to help it along? Life scientists have a bias in favour of biological information (as I have argued elsewhere) and this idea is at the heart of the discipline of Molecular Biology. This bias is so deep that most in the field cannot even imagine how life could be without it.

But let’s take a step back and think about metabolism in the context of cellular life. Life is a self-sustaining metabolic system. Some would even call it auto-catalytic. It takes in small molecules and energy from the environment, and uses these to create macromolecules. These macromolecules do not only take the form of information-storing devices (i.e. DNA and RNA) but as entities that can extract small molecules and energy from the environment. In this way, certain macromolecular assemblies can act in concert to  grow as a system. The end product is a siphoning of energy to convert a mixture of small molecules with high entropy, into a collection of macromolecules that have low entropy. In this way life appears to violate he second law of thermodynamics in that in propagates small pockets of order. However, this violation is a mirage. Life siphons some negative entropy from the environment (creating local order) while hastening the overall entropy of the entire system. This is due to the fact that the creation of macromolecules has side products that are exported back to the environment. These can consist of waste molecules, heat and other entropy-increasing processes. Of course some of these “waste” molecules can be used by other life forms. At steady state, the number of atoms given off by an ecosystem to the environment are balanced by atoms taken from the environment. So the only real input to the whole ecological system is a source of energy that is “harvestable” and the only real output is energy that is not “harvestable”, mostly thermal energy. But to think that life is strictly responsible for the conversion is wrong. In the absence of life, light from the sun would heat the Earth which would then radiate this energy away (as thermal heat and infrared radiation). The real analogy is life as a tiny water mill harvesting an infinitesimally small fraction of the energy from Niagara falls. Think about this view every time someone claims that life violates the second law of thermodynamics! 

But what is the energy fuelling this dynamic steady state system? For life on Earth there are only two sources: radiation from the sun, and thermal heat from the Earth’s core.

Seeing that life, from a metabolic sense, require energy to sustain the production of macromolecules, how do we reconcile this with the origins of life as explained by the RNA World hypothesis and the pre-biotic soup idea? Light and thermal heat cannot cause the appearance of a pre-biotic soup - they do not inject enough energy to create high-energy bonds. Early experiments looked at other energy sources to solve this, such as lightning. Now, I don’t want to delve too deeply into the Urey-Miller experiment, which is a standard part of every textbook, instead I want to talk about comparative analyses between the chemical dynamics present in deep sea vents and in how present day cells utilize energy. In other words, how do our cells use and manage energy? And how does this resemble energy sources that are naturally available?

These ideas began with researchers, especially Michael Russell, Günter Wächtershäuser, and later Bill Martin and Nick Lane. Russell pointed out how life uses proton gradients across membranes (aka chemiosmosis) as a source of energy and that this resembles processes found in deep sea vents. In fact, components of the main membrane-bound protein complex that couples proton gradients with ATP metabolism  (ATP synthase) are also in the list of universal genes. In some cases ATP synthase couples the dissipation of protons across the membrane to drive ATP production, in other cases ATP synthase works in reverse, and hydrolyses ATP into ADP and phosphate, to drive protons across the membrane against their concentration gradient. It turns out that there are natural geological processes that form these proton gradients. These occur at alkali white smoker hydrothermal vents at the boundaries of tectonic plates in the deep sea. In Nick Lane’s latest book, Transformer, he summarizes these ideas that originate from the work of Russell and Wächtershäuser. These vents drive proton gradients that start with high pH, or alkalai, water emanating from superheated underground sources that then filters through porous rocks and end in a lower pH, or more acidic, ocean. The rocks that the gradient passes through have small channels that are filled with iron-sulfur cluster compounds. These compounds have the ability to extract electrons from molecular hydrogen and convert these to acidic protons. These electrons are then passed onto carbon-dioxide to form carbohydrate [(CHO)x] compounds. In this way carbon could be “fixed” and used to assemble macromolecular molecules. Lane makes the point that carbon fixation can easily occur through the reverse Krebs cycle and that the intermediates of this metabolic pathway all have plausible prebiotic origins. As you can see below, carbon is added one at a time to macromolecules in his cycle. It is likely that this was preceded by a forked pathway***, but many of the major points are the same.

Again, this is likely not the original pathway, but is related to it. Despite this, there are some key things in there. First is the idea of a cyclical reaction. What is great about a cyclical, as opposed to a linear, reaction is that it regenerates its own substrates. In this way the cycle is itself a catalyst. Every turn of the cycle, carbon dioxide is converted to fixed carbon, which eventually leaves the cycle to become the basis of all the complex carbon structures in our cells. In fact, it is even better than a catalyst as some of the excess carbon structures can themselves be turned into intermediates of the reverse Kreb cycle. Thus the reverse Kreb cycle can increase its components an “grow”, and in a sense replicate itself. This is replication without any molecule that can “store information” in the way that molecular biologists define it (i.e. nucleic acids).

This is all fuelled by proton gradients and redox reactions. Note that in the reverse Kreb cycle, carbon is fixed with the help of iron-sulfide clusters (ferodoxin or Fd), and other agents that donate electrons such as FADH2, NADH and NADPH. In the process these co-factors are re-oxidized (to FAD+, NAD+ and NADP+) and use up a proton. In cells these cofactors are re-reduced at the membrane and this is coupled to proton pumping across the membrane. In Metabolism-First Theory the idea is that these regeneration reaction likely occurred at proto-membranes that formed across the canaliculi found in the rocks of the white smokers. Ultimately, this is all fuelled by the hydroxide (creating the pH gradient) and the reducing molecules (e.g. iron-sulfide clusters) that are spewing out from the core of the Earth. Because this all occurs at an interface between the Earth’s core and the oceans, the source of the gradient is inexhaustible (the Earth’s core) and the waste products have an infinite sink (the ocean), and so these reactions can go on indefinitely. Thus the self-sustaining reaction that is the reverse Krebs cycle is the proverbial little watermill siphoning off energy from Niagara falls, in the form of the Earth core/ocean interface. As Nick Lane points out, the Earth is a giant battery, with the white smokers being one of its reaction surfaces and our cells mimic this battery across all of their membranes. The geological battery interface at these white smokers can likely create reverse Kreb cycle intermediates, and this is exactly how the first life forms fixed carbon, and this is why we all contain this legacy within our cells, including our membranes, proton gradients, and the Krebs cycle.  

So what does this have to do with Assembly Theory?

I think that Metabolism-First Theory explains quite a bit of how the abiotic world could have transitioned to a biotic world. Despite this, Metabolism-First Theory is not widely believed****. Part of the reason is that most researchers do not understand Metabolism-First Theory, and few understand what problems it solves.

Ultimately, the belief that the transition between abiotic to the biotic world could simply be ascribed to the RNA World hypothesis, is rooted in the idea that life needs a molecular basis of information that can undergo evolution. Critics of this view have tried to point out that something had to proceed this world. For the RNA World to come into existence requires a superabundance of the right molecules under the right conditions. Somehow this criticism is ignored as the need for a God Molecule trumped all other concerns.

What Assembly Theory can provide is a way to talk about the probability of these states, and to provide a framework of how you transition from one state to the next. Going back to the chicken-egg scenario, where the information molecule  (or “God molecule”) is the chicken, and metabolism is the egg, Assembly Theory gives us a way of quantifying how improbable the assembly of the chicken from the chaos of the pre-biotic world is. At the same time, Assembly Theory gives us a way of quantifying the steps required to make the egg - a self-sustaining metabolic system as described by Metabolism-First Theory. It may help us to understand what it would take for the molecules and the system to “evolve” to the point that it resembles the scenario described by Metabolism-First Theory and beyond, towards the RNA World.     

We’ll have to wait and see if Assembly Theory can help the scientific community overcome its obsession with information-carrying molecules. Although they are important, they are clearly not the whole story. As a community we need to start understanding information molecules (i.e. RNA and DNA) and stop worshiping them brainlessly*****.

 

* Some would also claim that another define aspect of cells is its membrane – separating the cytoplasm (i.e. life containing fluid) from the extra-cellular environment. Conveniently, membranes are essential to most Metobolism-First Theories.

** For the record, the egg came first. Chickens are the product of recent evolution, eggs are much more ancient.

*** Thanks Larry Moran for pointing this out. For more on the Reverse Kreb Cycle and the Forked pathway, see this excellent post on his blog.

**** The authors of Assembly Theory should have spent more time addressing this in their paper instead of making grandiose claims.

***** This is also why scientists tend to be hyper-adaptationists, see Palazzo & Kejiou, Frontiers in Genetics