The Darwin Tree of Life (DToL) project has been established to collect all eukaryote species in Britain and Ireland for genomic sequencing. New tech developments have enabled high-quality genomic data to be a feasible outcome for some of Earth’s smallest inhabitants. This project will create a new resource of data open to all, which will contain the blueprint of thousands of organisms, holding the key to the evolutionary histories of understudied single-cell protists alongside more well-understood animals like the grey seal. This ambitious project is a collaboration of experts from different geographic and intellectual areas. It will provide the templates for new ways of working and uncover new scientific ground. In a world struggling under the threat of ecological collapse, this project will provide new bio-tech and engineering information to aid our understanding and management of natural ecosystems and the creatures which create them. The Marine Biological Association UK, based in Plymouth, is currently in the process of collecting marine organisms for the project. The marine environment has not been as well studied as terrestrial environments, and this offers a huge opportunity to expand our understanding of this underexplored realm and the creatures that live there, as well as providing context and detail to marine science which will provide new insights to marine research.

The Earth BioGenome Project (EBP) is an international initiative that is working towards sequencing the genomes of all of Earth’s eukaryote species over the next 10 years. Eukaryote species are those organisms having cells containing a nucleus and other membrane-bound organelles, which include plants, animals, fungi and protists. Protists is the group name for any eukaryote organism that is not a plant, animal or fungus; they are mostly single celled and are usually categorized as ‘plant-like’, ‘animal-like’, ‘fungi-like’ or ‘multi-cellular’ (this last category includes some quite large organisms, such as kelp). As part of this initiative, the Darwin Tree of Life (DToL) project was founded to deliver quality genomic data for all eukaryote species within Britain and Ireland, a biogeographical region home to 60% of all the major Eukaryota. The DToL project is built around multi-partner collaborations, with experts across several fields at different institutes involved (Figure 1). The project is also developing a wider network of experts and expert groups that bring in additional taxonomic expertise, including Seasearch, the Conchological society (Figure 2) and volunteers taking part in bioblitzes. Having an end-to-end process, which acknowledges the contributions of individuals and organizations and creates an open data set available to everyone, is a working example of how science can be both collaborative and ambitious. The DToL project is creating a digital library of eukaryotic genomes which will form the fundamental infrastructure for the future of biology, agriculture and medicine.

Figure 1

The logos of all contributing partners: The Wellcome Sanger Institute, EMBL-EBI, the Natural History Museum, The Marine Biological Association, the Earlham Institute, Royal Botanic Gardens Kew, Royal Botanic Garden Edinburgh, the University of Edinburgh, the University of Oxford and the University of Cambridge

Figure 1

The logos of all contributing partners: The Wellcome Sanger Institute, EMBL-EBI, the Natural History Museum, The Marine Biological Association, the Earlham Institute, Royal Botanic Gardens Kew, Royal Botanic Garden Edinburgh, the University of Edinburgh, the University of Oxford and the University of Cambridge

Close modal
Figure 2

Samples collected by Seasearch and the Conchological society, respectively

Figure 2

Samples collected by Seasearch and the Conchological society, respectively

Close modal

Britain and Ireland are surrounded by water, and some of our richest diversity lies within our seas. The Marine Biological Association of the UK (MBA), based in Plymouth, has been tasked with collecting the marine species of our coastal seas. Marine organisms are often less well understood than their terrestrial counterparts; however, this creates new opportunities for marine science, both now and for the future.

The first species collected by the MBA to have its genome sequenced is a very common seaside species: Steromphala cineraria, commonly called the grey topshell (you can read its published genome note in further reading). There is still so much we don’t know about common species. For example, understanding climate change responses of the grey topshell can tell us about the knock-on effects changes in the grazing intensity will have on the macroalgae that these snails eat. No living things exists in a vacuum; everything relies on the existence of something else; collecting only what species you think are important can mean that you are missing vital information that changes the whole picture. The MBA has already collected 568 species including larger animals like the cuckoo wrasse, very common seaside species like the dog whelk and tiny species you need a microscope to see, like the protist Alexandrium minutum (Figure 3).

Figure 3

Species collected by the MBA, clockwise from the top the cuckoo wrasse, the dog whelk, the now sequenced grey topshell and the protist Alexandrium minutum

Figure 3

Species collected by the MBA, clockwise from the top the cuckoo wrasse, the dog whelk, the now sequenced grey topshell and the protist Alexandrium minutum

Close modal

DNA is present in every living organism and contains information about how each living thing has evolved in a unique way to deal with the environments that each species inhabits. DNA is an endless story with a million different endings. This is an unfathomable source of data, which could hold many new discoveries (Figure 4). As with many of the scientists that came before us, we may not be aware of the full potential of the information we collect, but it will lay the groundwork for the scientists that come after us to build upon. Approximately 90% of the genomes sequenced to date are represented by animals, plants and fungi which collectively represent only two of the six major groups of eukaryotes. By aiming to sequence representatives from each family, including the extraordinarily genetically diverse and underrepresented protists, the DToL project will be filling in these gaps and paving the way for novel discoveries in biology. On a less optimistic note, for those species or ecosystems we may lose in the future, this will also create a blueprint for re-building communities where that may be necessary, for understanding resilience in the species that are still able to survive, and for finding ways to support adaptation of our ecosystems and the species within them to ensure the ecosystem functions we all need and enjoy.

Figure 4

The nudibranch Facelina auriculata, which was collected by the MBA and is currently at the Sanger Institute waiting for its genome to be sampled

Figure 4

The nudibranch Facelina auriculata, which was collected by the MBA and is currently at the Sanger Institute waiting for its genome to be sampled

Close modal

In 1735, Carl Linnaeus published Systema Naturae, a text which named 10,000 organisms, using a system called binomial nomenclature (the same two-term naming system we use in science today). Over the last 300 years, his legacy, the Linnean project, has named over 2 million species. This, in itself, is arguably a huge success for humanity. In modern science, having a universally accepted categorization system means that we can discuss the same species across different areas or even languages; binomial nomenclature is a language in common for scientists across the globe. In Linnaeus’ time, it was thought that there were about 10,000 plants in the whole world, but we now know of 390,900 plant species, and more are being discovered all the time. Understanding what everything alive is and giving it a name that shows its relationship to its closest relatives allow us to gain clarity and understanding of evolutionary processes.

In evolutionary biology, living organisms are all placed onto ‘trees of life’ with their closest related relatives. The majority of our existing understanding for this is based on morphological features (i.e., the way the organism appears) but looks can be deceiving, and many organisms can look very similar visually to another species that is completely unrelated. This is called convergent evolution, when two animals have evolved from separate ancestors, but look very similar. This is likely to be the case for many of the species we think we have a consistent name for, and genetic data can reveal these ‘cryptic taxa’ to uncover new species hidden in plain sight. The separation of species through evolution is not a singular event, but a process, and hybridization is common across many taxa. Having clear genomic data will therefore give us a richer understanding of how species are distinct from each other and allow us to discover the convergent evolution of complex traits.

Biodiversity is widely considered to be one of the greatest challenges we face today. We are in a sixth mass extinction which has the potential to impact ecosystems and therefore humanity in countless ways. Our activities and lives are driven by and feeds back into a system which provides us food, oxygen, water and a liveable climate, the stability of which is dependent on the array of organisms with which we share this world. Although we usually quantify biodiversity by the number of species in a geographic region, the complexity and functioning of ecosystems are driven by the interactions between these species. Monitoring biodiversity from the angle of ecosystem function is a complex concept which is still in its early days. Even in the field of economics, there is little uptake of models which place human society and economic activities within the natural world rather than the other way around. Genomic and ecological data can be integrated to reveal interaction networks which can be linked to historic ecological data and provide an avenue for bio-surveillance of past and present ecosystems. Ecosystem functioning and biodiversity is more than a warning story; linking genomics and biodiversity can allow prediction of resilience in relation to anthropogenic and climate impacts, and thus gives us an opportunity for mitigation. In addition, more than what we already have at our fingertips, there is a vast untapped resource in our biota which we are only beginning to explore.

Quality genomic data will be the foundations of our future. Complete genome assemblies provide invaluable data allowing a more complex and deep understanding of a species biology in relation to other organisms and other genomes. Publicly available DNA, RNA and protein data can be searched through to find novel metabolic pathways which could provide new sources of novel medicines and pharmaceuticals, plastic decomposing processes, biofuels and much more.

For example, marine species are an important source of biologically active compounds and sessile species (those that live permanently attached) such as sponges, bryozoans (moss animals) and sea-squirts may be a particularly rich source of novel compounds, sometimes involved in deterring predation and defending living space. Examples of compounds of marine origin include the anti-cancer drugs ecteinascidin-743, from a sea-squirt (Figure 5), and eribulin mesylate, from a sponge. The very detailed genomic information generated by DToL will facilitate the discovery of new marine natural products in the species included in the programme.

Figure 5

Photograph of a star ascidian (sea-squirt) Botryllus schlosseri (a sessile animal), which could hold useful compounds that are not yet discovered.

Figure 5

Photograph of a star ascidian (sea-squirt) Botryllus schlosseri (a sessile animal), which could hold useful compounds that are not yet discovered.

Close modal

Sequencing the human genome took 13 years and a billion dollars to achieve 92% completion, and was only declared complete in 2021. This project paved the way for new genome sequencing technology, and today DToL aims to do the same. New long-read technologies (such as HiFi – high fidelity – created by Pacific Biosciences and Nanopore created by Oxford Nanopore) are capable of generating single reads that are hundreds of thousands of bases long, and for a reasonable cost. Additionally, new methods are now being trialled and created by the DToL Sanger team which will allow the sequencing of very small organisms, such as meiofauna and plankton, from single specimens, pushing forward genome technology and creating new solutions.

Genome sequencing for the DToL project is carried out at the Wellcome Sanger Institute (Sanger) by the dedicated Tree of Life teams. Tissues are shipped to Sanger on dry ice or in liquid nitrogen. To extract DNA and RNA, the tissues are pulverized into a fine powder under liquid nitrogen. Aliquots of this powder are taken for preparation of chromosomal conformation capture sequencing (also known as Hi-C), a technology that uses proximity ligation in intact nuclei to identify sequences that likely derive from the same chromosome. The remaining powder is split between extractions of very long DNA, for sequencing using the PacBio Sequel II HiFi long-read method and extraction of intact mRNA, which is sequenced using Illumina RNA-seq or PacBio isoSEQ method. If you would like more detail on the sequencing strategy, the methodology is available in the Genome Note for the grey topshell. The DNA sequence data are processed and assembled to generate a best estimate of the genome sequence of the organism. The long PacBio HiFi reads are of very high accuracy and are used to generate a primary assembly by overlapping them. The Hi-C data are then used to both affirm the correctness of the HiFi assembly and ‘scaffold’ the long sequences into putative chromosomes. All the genomes are curated by an expert team to check for any errors or contamination before they are submitted to the public sequence database (the European Nucleotide Archive). The RNA-seq and isoSEQ data are then used by the Ensembl genome annotation team at EMBL-EBI to predict the protein coding genes in the genome. The result: A high-quality genome which can be used as a foundation for further research – to answer questions about how the organism functions, about its roles in the ecosystem it lives in, about its genetic diversity and about its place in the evolutionary tree of life.

Plymouth has long been an area marine biologists flock to. The MBA (Figure 6) was founded in 1884 and many scientists have explored the marine ecosystems around the area to undertake natural history surveys and experiments. Other marine science institutes are also in Plymouth, such as the Plymouth Marine Laboratory, Marine Research Plymouth and University of Plymouth. As well as ongoing sampling and study, there is a wealth of historic data starting from the late 1800s for a wide range of taxa from protists to fish, and the Continuous Plankton Recorder (CPR) survey hosted at the MBA is the longest running, most geographically extensive marine ecological survey in the world. Many of the samples have been collected locally, creating type specimens for the area.

Figure 6

The Marine Biological Association and Hoe seafront

Figure 6

The Marine Biological Association and Hoe seafront

Close modal

Plymouth is also the largest naval base in Europe and, because of this, is home to several busy ports and marinas which are key vectors for invasive species. As one of the more southerly points of England, it is also warmer than many other coastal points. It is therefore a place where some invasive species are able to settle and survive, as many invasive species arrive here from warmer locations in the world. Many invasive species have already been collected as part of the project, and this information could be vital to those trying to understand the mechanisms of colonization and the vectors of spread-invasive species.

graphic

Kes is a Research Assistant on the Darwin Tree of Life project at the Marine Biological Association UK. She is mainly involved with collection and processing of samples, as well as providing communications content. She has a keen interest in fish and in biodiversity and ecosystem conservation.

graphic

Joanna is a Research Technician on the Darwin Tree of Life project at the Marine Biological Association UK. They are involved with various aspects of the project, including barcoding, collection and processing, among other tasks. Joanna is a polymath marine biologist with an interest in all things science.

graphic

Patrick is a Research Assistant on the Darwin Tree of Life project at the Marine Biological Association UK. He is mainly involved with collection and processing of samples and collaborating with other organizations to ensure the project has expert input from professionals in the field. Patrick has a keen interest in worms and all other overlooked marine biota.

Published by Portland Press Limited under the Creative Commons Attribution License 4.0 (CC BY-NC-ND)