Note: Because some of the information in this article may be outdated, it has been archived.
1. Introducing LUCA
LUCA is short for Last Universal Common Ancestor, and it is from this organism that every living cell on the planet has descended. LUCA does not represent the earliest stage in the evolution of life — it is widely accepted that before the evolution of proteins and DNA (which are common to all cellular life) there was a period where RNA carried out the roles now performed by proteins and DNA [Jeffares & Poole 2000]. There are a lot of uncertainties when we go this far back in evolutionary time and perhaps all we can be certain of is that, at a point in Earth’s history (probably over 3 billion years ago), cells emerged which stored recipes for making both proteins and RNA on a third molecule, DNA.
Nevertheless, studying LUCA is not science fiction. In the same way as humans and chimpanzees shared a common history until less than 10 million years ago, all modern lifeforms shared a common history back as far as the split that gave rise to the three ‘domains’ of life we now know of as archaea, bacteria and eukaryotes; that is, back as far as LUCA. That there are three domains was first established by Carl Woese and colleagues, who found that the group called prokaryotes was actually two groups, the archaea and bacteria [Woese & Fox 1977; Woese et al. 1990]. Amazingly, this work has largely stood the test of time, and though it is argued that there has been extensive gene swapping between these two groups [Pennisi 1998, 1999; Doolittle 1999; Eisen 2000], recent analyses using complete genomes supports Woese and colleagues’ decision to split prokaryotes into archaea and bacteria [Snel et al. 1999; Sicheritz-PontÃ©n & Andersson 2001; Brown et al. 2001].
Woese and colleagues’ discovery rested on an even more astounding one — all life stores its genetic information on DNA, using a common code which we call the genetic code. The information is stored as packets, called genes — recipes for making RNA, and proteins [see Appendix A: Making Protein]. The languages of DNA and RNA are so similar they may as well be called dialects, but both are markedly different from the language of protein. For RNA and DNA, the information-carrying part of both molecules is made up of four bases (analogous to letters in an alphabet) read in a linear fashion, as with written human languages. In RNA, the four bases are A, G, C and U. In DNA, A, G and C are also used, while T is used instead of U. Establishing the evolutionary basis for this change from U to T is not a trivial exercise, and is an interesting problem in itself [Poole et al. 2001]; but in terms of the actual language, the difference is as minor as the variant spelling of English words, e.g., civilisation and civilization.
The unearthing of the genetic code, and the subsequent demonstration that it is common to all life (a gene from a human can be read by the translation machinery of a bacterium) is without a doubt a key piece of evidence in establishing that there was a LUCA. But what else can we find out about LUCA? Knowing that the genetic code had arisen tells us there was probably a LUCA, but gives us very little information on the nature of LUCA.
In a nutshell, the study of LUCA broadly revolves around two questions:
- What features are common to all cellular life?
- What sets the three domains — archaea, bacteria and eukaryotes — apart from one another?
At first sight, building a list of LUCA features might seem a fairly straightforward process, especially now that advances in technology allow all the genes possessed by an organism to be identified by sequencing its genome. A sensible approach would perhaps be to compare all the genes from representative genomes of archaea, bacteria and eukaryotes. Those genes that are common to all three domains were in the LUCA and those that aren’t must have been added later. Unfortunately, it’s not that straightforward, for two main reasons:
- Some genes appear to have moved from organism to organism like genetic gypsies, confounding our ability to distinguish between features that are universal and date back to LUCA, and features that are universal because of genes moving about.
- Some genes which were found in LUCA may no longer be universal. That means it may be impossible to distinguish some LUCA features from genes that arose later, say in the evolution of eukaryotes.
Ask any two researchers to give an overview of what they think the LUCA was like, and you will no doubt get different answers. With such a tricky scientific endeavour as this — working out what an organism that lived billions of years ago was like — this is hardly surprising. Some of my own views on one aspect of the LUCA question are to be found in an earlier piece posted on Actionbioscience.org which I co-wrote with Dan Jeffares [Jeffares & Poole 2000], but what follows is a broader overview of the fast-growing field of ‘LUCA biology.’
2. The minimal genome project
One hands-on approach to trying to uncover the biology of the LUCA has been to look for genes that are universal — that is, genes that all life forms possess. Once a list of these genes has been made, they also lead to another possibility: perhaps this list encapsulates the essence of cellular life — the minimum number of genes required to make a cell. In 1996, with the sequences of the first two bacterial genomes (Mycoplasma genitalium & Haemophilus influenzae) in hand, Arcady Mushegian & Eugene Koonin [Mushegian & Koonin 1996] tried exactly this. The most striking features of their minimal genome were:
- A mere 256 genes
- No biosynthetic machinery for making the building blocks of DNA
From this they tentatively concluded that LUCA stored its genetic information in RNA, not DNA, and made suggestions on how to further reduce the number of genes in their minimal genome. The work heralded the arrival of comparative genome studies, and there is no doubt that a good number of the genes in their 256-strong list do date back to the LUCA. However, the work was squarely criticised because of the omission of DNA [Becerra et al. 1997]. Both these bacteria are human parasites and it seems most likely that they did away with parts of the machinery for making their own DNA because they can steal from the host (i.e., the organism infected with these pathogens). Indeed, why put in the effort to make your own when it’s there for the taking?
Regardless of whether or not DNA was a part of the LUCA (I think it was [Poole et al. 2000], but there are plenty of researchers that beg to differ [e.g., Leipe et al. 1999]), this omission highlighted a wider problem with the minimal genome. Namely, the genomes you begin with probably affect the final set of genes. This is a problem for the following reasons:
How many genomes must be compared before we are confident we aren’t leaving anything out?
‘Lifestyle’ can affect the final list (in Mushegian & Koonin’s work, the minimal gene set may in fact be a generic set required for parasitism in humans, and has little in common with what was required for a free-living cell to go about its business billions of years ago)
Gene losses: if a gene was in the LUCA, but now remains in only one of the three domains, this method would consistently leave it off the list of LUCA genes.
Finally, if genes can move from organism to organism (so-called horizontal or lateral gene transfer), certain genes may have done such a good job of spreading that they sometimes appear to date back to the time of LUCA, whereas in actual fact, they arose more recently.
Despite its limitations, the minimal genome concept is probably the best attempt to put money where mouth is and come up with a hard list of genes that may have been a feature of the LUCA. It is also the only sensible framework we currently have. That said, if gene transfer is extreme, genes will have moved about so often that this and related methods are rendered futile [Doolittle 1999].
Koonin has recently published an updated minimal gene set, using 21 complete genomes [Koonin 2000]. Surprisingly, of the 256 genes in the original set, only 81 remain, and this list is clearly insufficient to describe either the minimum number of genes required for a cell to function, or the genetic makeup of LUCA.
While working out which genes were part of LUCA is no easy task, the various attempts [Mushegian & Koonin 1996; Kyrpides et al. 1999; Hutchison et al. 1999] are a good starting point, and have served to highlight important problems that must be dealt with in the field of ‘LUCA biology.”
3. LUCA genomics
As the minimal genome work demonstrates, the major issues are:
- How much does reliance on universal features underestimate the genetic makeup of LUCA?
- How much gene swapping has gone on during the evolution of life from LUCA to the present?
The magic number of universal features is likely to shift about, and there have been plenty of criticisms of all the attempts to reconstruct the LUCA. Nevertheless, universal features are important because they describe a lower limit from which to build upon, and importantly, all these attempts converge on agreement insofar as concluding that LUCA was quite complex. Some of the gaps will be relatively simple to fill, but others may be close to impossible.
LUCA biologists are aware that universal features may underestimate the complexity of LUCA to some extent, but another concern is emerging that could cause even more headaches — the spectre of ‘horizontal gene transfer’ (also called lateral gene transfer):
- If there is lots of gene swapping between organisms, the tree of life becomes more like a web, and it may not be possible to disentangle the branches.
- If genes are extremely nomadic, truly universal features cannot be distinguished from genes that have successfully spread themselves by gene transfer.
The problem of gene transfer was made apparent in a landmark study of two bacterial genomes, Escherichia coli and Salmonella. Jeffery Lawrence & Howard Ochman [Lawrence & Ochman 1998] concluded that, since diverging from a shared ancestor 100 million years ago, at least 10% of the E. coli genome has been acquired in somewhere in excess of 200 horizontal gene transfer events.
In an equally insightful commentary, William Martin [Martin 1999] has discussed the implications of this work for our ability to reconstruct phylogenetic trees:
- The further back in time an evolutionary divergence, the greater the likelihood that any given gene in a genome has been transferred.
- Indeed, it may be the case that all bacterial genes have been subject to horizontal gene transfer at some point in their evolutionary history.
- This could undermine the utility of phylogenetic tree reconstruction for deep divergences.
Currently, there is a lot of debate over whether gene transfer is so rampant that evolutionary trees cannot be built, or whether the levels of gene transfer are negligible. Both extremes are currently championed in the literature, and ironically, when it comes to the LUCA, Carl Woese’s work is central to both — many of those that view Woese’s three domains as correct have been arguing for little or insignificant levels of transfer, whereas Woese has recently suggested that very early in evolution, gene transfer between organisms was more important than inheritance from generation to generation [Woese 1998].
While it sounds like Woese is being inconsistent, his more recent claim is limited to the earliest periods in the evolution of life, and arose from concerns that LUCA was beginning to appear totipotent, a crazy notion that would have LUCA as the ultimate source for all life’s diversity:
- A fertilised egg is totipotent — from a single cell it will develop into all the different cells and tissues that make up an adult human being.
- If genes move between organisms, LUCA might mistakenly appear totipotent because many features would be incorrectly counted as universal.
Extrapolating Lawrence & Ochman’s result back billions of years may not be realistic. But what if horizontal transfer was the default state? This is the idea Woese has developed. His argument is that genes were so free to exchange that there were no distinct lineages — genes moved more through horizontal transfer than by vertical inheritance. As the genetic system becomes more accurate and as the complexity increases, more genes become interdependent, and transfer gives way to vertical inheritance. Woese argues that translation (and therefore the genetic code) was the first thing to be fixed or crystallised, with other cellular functions following later. From this horizontal transfer dominated system, the three domains (archaea, bacteria and eukaryotes) each emerged independently as lineages.
This is certainly food for thought, but there are several issues:
Gathering evidence to support it is not exactly easy since there’s no real way to establish that horizontal transfer was the initial state.
Koonin’s shrinking minimal gene set (Koonin 2000), shows that the minimal genome approach is not creating a totipotent LUCA. Instead, the number of genes ascribable to LUCA is becoming smaller as more genomes are added.
Another problem has to do with the switch from horizontal transfer to vertical inheritance. How many genes would have been able to partake in global transfers before becoming crystallised and therefore unable to transfer? Would it really have been complex enough for the ancestors of the three domains of life to have emerged independently as distinct lineages?
Gene transfer is going to be a hotly debated topic for a while, and will continue to confound the reconstruction of the LUCA. The problem is a complex one:
Horizontal gene transfer has been demonstrated — e.g., the spread of antibiotic resistance.
Limitations of the methods for building evolutionary trees can give false evidence for gene transfer.
Methods that don’t make use of evolutionary information are being used to examine genetic relationships and, in many cases, the data that have been used to argue for horizontal gene transfer are weak.
There is little consensus on the reliability of methods for detecting horizontal gene transfer.
What data are required to demonstrate ancient horizontal gene transfer events?
If natural selection is considered, most horizontal gene transfers will probably result in the gene being lost — by analogy, the organism needs a new gene like a fish needs a bicycle! For instance, antibiotic resistance genes won’t spread and be maintained by selection unless the organisms with the genes are being assaulted with the antibiotic.
This last point has been too often ignored, and there has been little attempt to establish patterns (e.g., are all genes equally nomadic?). So how should LUCA biologists deal with horizontal gene transfer?
If we accept that there is or has been massive unbridled horizontal gene transfer between the three domains [e.g., Doolittle 1999], we must conclude that all our tools for looking into the evolutionary past are invalidated, which means we might as well give up on the question of the LUCA. We know that there are demonstrated cases of horizontal gene transfer, but this extreme position is like throwing the baby out with the bathwater.
If we take as our starting point the opposite extreme, that the effect of horizontal gene transfer has been negligible, we are in a much better position — we still have our tools in place, and any suggestions of horizontal gene transfer will need to be backed up with good evidence.
There is no doubt a middle ground can be found, but amidst the furore over horizontal gene transfer, a number of researchers, making use of whole genome sequences, have reported results suggesting gene transfers have minimal effect on the ability to recover evolutionary trees [e.g., Snel et al. 1999; Sicheritz-PontÃ©n & Andersson 2001]. These results suggest that it is possible to reconstruct the tree of life, and moreover, conclude that the 3 domain structure of the tree, as first reported by Woese and his colleagues, is supported by whole genomes.
In a timely article, Chuck Kurland has firmly criticised the eagerness of many to attribute horizontal gene transfer [Kurland 2000]. One particularly interesting aspect of his exposition is that he suggests a number of non-scientific factors that have contributed to the hype around horizontal gene transfer, and is as much a comment on how science currently operates as it is about gene transfer.
The root of the tree of life is hard to pin down [Pennisi 1999], and unbridled horizontal transfers early in the evolution of life can’t easily be distinguished from the limits of the sensitivity of our phylogenetic tools — that researchers have failed to reach a consensus on the shape of the tree does not mean that there must therefore have been horizontal transfer.
Indeed, there is another issue here — the reliability of the methods used for building evolutionary trees. Many researchers are very confident of the reliability of these methods, yet it is well known that these are based on mathematical algorithms which are convenient, but which do not necessarily accurately model real biological sequence evolution. These methods are likely to be robust for recent evolutionary events, and are definitely the most robust of the methods for detecting gene transfers. The problem is that near the root of the tree of life, they may be just too inaccurate to be useful for scrutinizing the very earliest events in evolution. In a worst-case scenario, the situation might in fact be a bit like timing the 100M sprint at the Olympics with a sundial!
David Penny, Bennet McComish and their coworkers have recently tried to address this question by investigating how far back in time the standard models used in evolutionary tree building can go before they start to go wrong. Their overall conclusion is that the models used do seem to do a little better than might be expected from theory, but that the models still do poorly for very early evolutionary events. Penny and colleagues also criticise the recent trend in reporting conflicting trees as evidence for horizontal gene transfer — given how hard it seems to accurately reconstruct the tree of life, it is hard to say whether conflicting answers are evidence for gene transfer, or just reflect the limitations of the methods for building the trees. Their testing of the models suggests that it is just not reasonable to say that there is horizontal gene transfer just because two trees made with two different genes don’t come back with the same relationships between organisms. They make the following comment, which sums up the problem very succinctly:
“… there are major difficulties between data sets for ancient divergences. It is difficult to see why researchers are so confident in their results when the relatively recent divergences within mammals, birds, or flowering plants are only now being resolved.”
This work of Penny et al.  and the picture coming from evolutionary trees of whole genomes [Snel et al. 1999; Sicheritz-PontÃ©n & Andersson 2001] seems to bolster Kurland’s provocative assertion that horizontal transfer is ‘an ideology that is begging for deconstruction.’
Nevertheless, horizontal gene transfer does occur to some extent — Lawrence & Ochman’s 1998 paper is but one of many demonstrating this. Moreover, many of the technologies biologists use for inserting genes are simply human exploitation of what has been described as natural genetic engineering. The following naturally occurring mechanisms of ‘genetic engineering’ are routinely used in molecular biology laboratories:
- Plasmids: small, usually circular, pieces of DNA that often carry genes that enable them to move from one bacterium to another.
- Viruses: many will naturally insert themselves into the DNA of the organism they are infecting, and can be engineered to carry extra pieces of DNA.
- Natural or assisted DNA uptake by bacterial cells.
- Restriction endonucleases: molecular scissors that allow precise ‘cutting’ of DNA.
We also know it is possible to identify ancient gene transfers that may have occurred as far back as 2 billion years ago. Biologists can readily identify genes in the eukaryote repertoire that have come in via the mitochondrion, a compartment in the eukaryote cell which is bacterial in origin. Indeed, the handful of genes remaining in this compartment have been shown to be bacterial in origin, as have some that have since taken up residence in the eukaryote nucleus [Lang et al. 1999].
Returning for a moment to the biology of nomadic genes, the consensus emerging from studies of bacteria is that we should indeed start thinking of bacterial (and perhaps archaeal) genomes as being an ever changing collection of genes, but only to a degree [Hacker & Carniel 2001]:
- Genes which are central to the running of any cell — these are often referred to as housekeeping genes — make up the ‘core’ genome.
- Genes which come and go make up the flexible genome.
The flexible genome might be a window into the nomadic gene pool of bacteria — Lan & Reeves  point out that closely related strains of bacteria differ in the genes they carry by as much as 20%, and this requires we reevaluate how we categorise species of bacteria. The hope is that the flexible genome can tell us how a particular bug is currently making its living. Losing genes isn’t in itself particularly surprising for organisms that are in competition to reproduce as fast as they can [Jeffares & Poole 2000] — genes that aren’t being used aren’t kept. ‘Use it or lose it’ is the maxim of natural selection, and there are plenty of examples wherever you look in biology (our appendix and tail bone both appear to have headed in that direction for instance).
If this picture of core and flexible genomes is correct, it is good news for LUCA research because many universal features can in theory be recovered. This goes too for the ancient horizontal transfers seen with the mitochondrion. We should be optimistic that some patterns of horizontal gene transfer can be analyzed, though we still need to exercise care when looking so far back in time.
When it comes to horizontal gene transfer, the hype about archaea and bacteria is arguably a case of squabbling over crumbs when compared to what seems to have happened in the early evolution of the eukaryotic cell. The now popular idea that eukaryotes emerged from a massive fusion (the ultimate gene transfer) event between a bacterium and an archaeon is also raising problems for LUCA biology.
While fusion is all about the origin of the eukaryotes, it is also about LUCA:
- Fusion scenarios challenge Woese’s division of the living world into three domains.
- Rather than a tree with three branches all tracing back to the LUCA, fusion has two lineages, with the eukaryotes emerging by fusion.
- Fusion is in conflict with the emerging picture of the direct link between eukaryote biology and the RNA world.
A number of researchers have argued that the genes in the average eukaryote look to be a mixture of bacterial-like and archaeal-like. That is to say, at the genetic level, eukaryotes look to be some sort of genetic fusion between archaea and bacteria [Ribeiro & Golding 1998; Rivera et al. 1998; Horiike et al. 2001].
In understanding this, it has been helpful to divide genes into two categories: informational or operational [Rivera et al. 1998]. Informational genes are those which are involved with the copying, storing and regulation of genetic information, while operational genes are the recipes for making proteins for synthesis and breakdown of molecules in the cell, and are largely involved in energy metabolism.
Consistent with earlier research [see Gupta & Golding 1996] Rivera and colleagues found that there was rhyme and reason to the mixture of bacterial and archaeal genes in eukaryotes:
- For informational genes — archaea and eukaryotes share more in common
- For operational genes — bacteria and eukaryotes share more in common
Mark Ridley  has suggested a good analogy for what many think has happened — a business merger. Instead of doubling up and having two departments for every aspect of the new company (Eukaryote Inc.), only one of each was kept, with the result being that the informational department came from Archaea Inc. and the ‘operational department’ from Bacteria Inc.
With the ongoing debate on how much horizontal gene transfer there is between organisms, the most exciting contribution to this picture looks not at the genes, but at gene networks. Taking a page from the study of complex networks such as the Internet, EÃ¶rs SzathmÃ¡ry and colleagues [Podani et al. 2001] have recently shown that, while eukaryotic operational genes appear bacterial in origin, the structure of the metabolic network that these genes make up is in fact much much more like what is observed in archaea. In keeping with the business merger analogy, this is perhaps equivalent to keeping the management structures of Archaea Inc. in place.
This is an exciting picture, and there is no question that modern-day eukaryotes are the product of some sort of fusion [Ribeiro & Golding 1998; Horiike et al. 2001]. However, the tricky thing is working out what it all means for the origin of the eukaryotic cell. These are some of the outstanding issues:
- Why has such a merger apparently only happened once?
- No one has ever observed modern bacteria and archaea fusing.
- Why is it we don’t see ‘anti-eukaryotes’ (that is, organisms which have the operational genes of archaea and the informational genes of bacteria)?
- A number of features found exclusively in eukaryotes are tricky to explain by a fusion event.
Indeed, there are a number of ways of explaining the fusion data, and consequently, there are quite a few different opinions on how the eukaryotes came to be [Minkel 2001].
If eukaryotes are the result of a fusion between a bacterium and an archaeon, then the 3 domain picture that Carl Woese’s work supports would be wrong. Fusion would imply that everything in eukaryote biology is either a recent innovation specific to this domain, or an offshoot of the biology of archaea and bacteria. In other words, if you want to know about LUCA, archaea and bacteria are the only two domains worth looking at. This is an assumption that is often made, regardless of fusion, and a point against which some researchers, myself included, have argued [see Jeffares & Poole 2000; also Forterre & Philippe 1999; Poole et al. 1999].
To make sense of the motivation behind the many emerging fusion scenarios for the origin of the eukaryote cell and how these might impact on LUCA biology, it helps to concentrate on the big picture, rather than wading through the details of the various scenarios. Laura Katz  has written a good overview of the various fusion scenarios, though several new scenarios have been published since then [e.g., Margulis, et al., 2000; Horiike et al. 2001; Bell 2001; Hartman & Fedorov 2002]. Fusion theories have developed out of the endosymbiotic theory for the origin of the mitochondrion:
- The endosymbiotic theory was first formulated by Mereschkowsky at the beginning of the 20th century, but reintroduced and updated by Lynn Margulis in the 1970s [Martin et al. 2001].
- This theory argues that the mitochondrion, sometimes called the powerhouse of the cell, was originally a bacterial cell that took up residence in the ancestor of modern eukaryotes.
- Both structural and genetic similarities have shown without a shadow of a doubt that the endosymbiotic theory is correct — the DNA in the mitochondrion is more closely related to bacteria than to the DNA stored in the eukaryotic cell nucleus.
- It is now widely accepted that this event happened once only.
Despite much agreement, there is ongoing debate surrounding the endosymbiotic theory:
- How was this partnership founded (e.g., oxygen-based or hydrogen-based metabolism)?
- Was the host that ultimately engulfed the bacterium a eukaryote or an archaeon?
The first question opens up a whole can of worms (which we’ll avoid here), and is a current source of intense debate [Andersson & Kurland 1999; Rotte et al. 2000]. The second question is what has the major impact on LUCA biology, but these two questions have been unnecessarily muddled. The bottom line is that the genomic & gene network data supporting fusion between an archaeon and a bacterium can as easily be made to fit a fusion between a eukaryote and a bacterium.
The state of the field is as follows:
- Everyone agrees that the mitochondrion evolved from a bacterial ancestor (though there is current debate as to what the bacterial ancestor was like, and how it interacted with its host).
- There is disagreement as to whether the host was a eukaryote, or an archaeon.
- Archaea-Bacteria fusion hypotheses require all genes found only in eukaryotes to have arisen post-LUCA, post-fusion — that is, they are indirectly descended from LUCA.
- This comes into conflict with the picture of LUCA from RNA [Jeffares & Poole 2000], and Woese’s tree of life [Woese et al. 1990] — both require that eukaryotes were directly descended from LUCA.
So how do we distinguish between an archaeal and a eukaryotic host? The key is in two parts — one is historical and the other requires careful thought about how archaea and eukaryotes might be related:
- The historical aspect centres around understanding the shift in thinking from the original picture of an ancient eukaryote playing host to the now largely agreed-upon picture of an archaeon playing host. This shift largely revolves around the changing branches in the eukaryote evolutionary tree [Dacks 2002].
- The relationship between archaea and eukaryotes cuts to the heart of how researchers view the evolution of cells.
Archaezoa - missing links lost
So why is it that fusion hypotheses have become so popular? Indeed, this goes against the classical interpretation, most thoroughly espoused by Tom Cavalier-Smith , who identified a disparate group of eukaryotes that appeared to him to be missing links — the so-called Archaezoa, which look like eukaryotes but lack mitochondria. His hypothesis, that the Archaezoa evolved before the introduction of mitochondria into the eukaryote lineage, held sway for many years, though has recently been dropped in favour of fusion:
- There is growing evidence that all eukaryotes once harboured mitochondria
- Thus, the Archaezoa have probably all lost their mitochondria, rather than never having had them [Embley & Hirt 1998].
- For instance, one group of the Archaezoa called the microsporidia are now widely accepted to have been incorrectly placed very deep on the eukaryotic tree. Indeed, probably most of the Archaezoa, if not all, are incorrecly placed on the tree. Rather than being missing links leading back to the origin of eukaryotes, they probably arose more recently [see Dacks & Doolittle 2001; Keeling 1998; Dacks 2002].
- If the Archaezoa aren’t a series of missing links, the origin of eukaryotes may have been concurrent with the endosymbiosis that gave rise to mitochondria.
The conclusion from the above is that all eukaryotes probably had a mitochondrion, and without the Archaezoa, the only ancestor of eukaryotes is archaea. VoilÃ ! We have fusion.
A case of throwing the baby out with the bathwater?
The important point to keep in mind about the picture for fusion is that it is a partial one, based largely on gene data. There are a large number of differences in the general structure of eukaryotic and prokaryotic (archaeal & bacterial) cells that aren’t explained by fusion [Poole & Penny 2001]. However, the major inconsistency is that the picture provided from trees is not the same for the relationship between archaea and eukaryotes, and that of bacteria and eukaryotes [Poole & Penny 2002, submitted]:
Margulis’ hypothesis is evidenced from trees. There is now overwhelming agreement that the mitochondria branch is within the bacterial tree, specifically within a subgroup called the alpha-proteobacteria [Lang et al. 1999], and a bacterial origin is also observed for chloroplasts (where photosynthesis takes place in plants and other photosynthetic eukaryotes).
Comparisons of relevant genes from eukaryotes and archaea should give this picture also, yet the evidence points to archaea and eukaryotes being very distinct domains.
This has strong parallels to the way the Archaezoa case is being treated — if there are no modern groups of archaea that appear to have split from the trunk of the tree of life before the appearance of eukaryotes, should we accept fusion? Stronger evidence was certainly required in testing the origin of the mitochondrion!
Another issue has to do with missing links. If the disappearance of the missing links (the Archaezoa) is used to suggest fusion, it is surely just as reasonable to argue against fusion on exactly the same grounds — there are no intermediates between eukaryotes with mitochondria and the archaea. For example, we don’t see examples of archaea with mitochondria in them, or archaea with nucleus-like structures.
With perhaps a couple of billion years separating the divergence of archaea and eukaryotes, it would be incorrect to require that the archaeon in the fusion must have been just like modern archaea. This cuts right to the heart of the problem — there is no inherent requirement that evolution leaves behind a series of intermediates for us to use to piece together the different evolutionary trajectories of archaea and eukaryotes. As with Chinese whispers, the end point may be very different from the starting phrase, but with evolution, all we have to look at is a number of different endpoints, from which we can only guess at the starting phrase!
While the specifics of the Archaezoan hypothesis are most probably wrong, it should not be thrown out completely. Explanations of eukaryote origins by fusion or via Margulis’ original scenario each suffers from the disappearance of intermediate forms, but this is expected. As Stephen Jay Gould had often said, evolution results in bushes, not ladders.
A number of researchers [Forterre & Philippe 1999; Andersson & Kurland 1999; Penny & Poole 1999] maintain that the data for fusion can be reconciled with Lynn Margulis’ endosymbiotic theory and Carl Woese’s three-domain tree. Indeed, David Penny and I have argued that fusion does the worst job of explaining the available data [Penny & Poole 1999; Poole & Penny 2001]. For instance, fusion doesn’t fit with the hypothesis that some eukaryote features, which have since been lost in archaea and bacteria, actually date back to the LUCA (see Jeffares & Poole 2000).
What it very tentatively implies is that archaea and eukaryotes may have shared a more recent ancestor than either shares with bacteria, as is often shown in textbooks, but this too is not certain, since the relationships between these three groups is also a point of controversy [see Forterre & Philippe 1999; Pennisi 1999]!
We are now entering a very exciting period in uncovering the history of the LUCA — the field has been given a major boost from a broader range of ideas being applied to the problem:
Acknowledging the technical challenges with building the tree of life is an important step in the right direction. So is the idea of using the RNA world period in the origin of life for establishing aspects of the nature of the LUCA (see Jeffares & Poole 2000).
Despite the deluge of genome data available, it is hard to say whether we will ever actually manage to get a complete list of genes which LUCA possessed. We may pick out certain characteristics, but each needs to be evaluated with extreme care. The minimal genome study of Mushegian & Koonin  demonstrates this and, in its failure, it stands as a strong caveat. Most researchers have toyed with this idea, and many were probably disappointed (then later relieved) that Mushegian & Koonin beat them to it!
Horizontal gene transfer is likely to be a factor in confounding such efforts, but it is better to err on the side of caution with respect to how pervasive this is in the history of life. The emerging picture from genome research suggests that not all genes transfer equally easily, and that there may be an ecological underpinning to the nature of gene transfer.
The fusion hypothesis has important consequences for the LUCA — if correct, the LUCA must have been like bacteria and/or archaea, because those unique features of the biology of eukaryotes had not yet evolved.
The three domain tree that emerged from Woese’s original work permits features of all three domains to trace back to the LUCA while the fusion hypothesis, in its strictest form, does not. Unless an argument for loss of a feature in all modern archaea can be made, it is diametrically opposed to the nature of the LUCA as suggested from RNA world fossils [Jeffares & Poole 2000].
Currently, many major assumptions are being questioned:
- Were there three domains or two, with the third arising by fusion?
- Was LUCA prokaryote-like or eukaryote-like or even a mixture?
- Is the genetic code the only one possible?
- Was early evolution more reliant on horizontal gene transfer than inheritance?
- Was there one or more LUCAs?
Each of these questions could easily fill a book, and it has become impossible to cover every aspect of LUCA biology in one article. To the casual observer, the field of LUCA biology looks to be in disarray, with everyone having their own pet theory. This can be exciting, frustrating, and, at times, bordering on the absurd, but above all it is a sign of healthy debate! Many views and varied approaches to the problem means some exciting answers to some fundamental questions about life’s origins are just around the corner…
© 2002, American Institute of Biological Sciences. Educators have permission to reprint articles for classroom use; other users, please contact firstname.lastname@example.org for reprint permission. See reprint policy.
To make a protein, the information stored in DNA language must be translated into the language of proteins, where, instead of a four-base alphabet, the alphabet is made of 20 amino acids. Translation is predominantly an RNA-affair, with a working RNA copy of the DNA gene (called messenger RNA, or mRNA) being read by the ribosome, the protein synthesis factory found in all cellular life forms. The actual translating from DNA/RNA language into protein language involves an RNA called transfer RNA, or tRNA for short.
tRNA is the key to understanding the nature of the genetic code. In making an RNA, each of the bases A,G,C,U is a word in itself, but in order to go from 4 bases to the 20 amino acids that make up proteins, the bases are grouped into ‘words’ three bases long (called codons), making a dictionary of 64 possible words. In this way, all 20 amino acids can be described in three-letter or triplet code, with room to spare. 61 combinations are used (the remaining three are read as ‘STOP MAKING PROTEIN’), and an important feature of the code is redundancy — for instance, there are 6 codons that code for the amino acid leucine. The consequence of such redundancy is that the translation process is less susceptible to errors than a one-to-one code.
Translation requires that each tRNA is charged at one end with an amino acid. At the other end, an anticodon matches up with one of the codons that match that amino acid. Translation involves docking charged tRNAs at the appropriate point on the mRNA as the ribosome moves along, reading the triplets off. When the ‘correct’ tRNA is in place, the ribosome joins the amino acid carried by that tRNA to the growing chain of amino acids, which together make a protein.