Proteins are the molecular machineries of life, and engineering them in order to probe and tailor biological function has long remained pure imagination. Cracking the genetic code, the blueprint of proteins, gave the spark to turn precise protein engineering into reality, and even to explore new designer genetic codes. Organisms in all kingdoms of life evolved a highly similar genetic code, which defines how genetic information is translated into proteins. Only a small set of 20 so-called canonical amino acids are used by ribosomes to build proteins. Is the genetic code in this form key to all lives, no matter where in the universe it evolved? Already on Earth we find two powerful exceptions. Selenocysteine represents the 21st canonical amino acid and can even be found in some human proteins. Pyrrolysine is termed the 22nd proteogenic amino acid and has been identified, e.g., in archaeal organisms that grow in anaerobic environments, such as the sludge of a lake or the paunch of a cow. So if one could expand the genetic code even further, proteins could be generated consisting partially or even entirely of artificial amino acid polymers, giving rise to a large diversity of novel natural and unnatural functions. In fact, this future is already present, with genetic code expansion representing a powerful technique which allows to equip proteins with hundreds of non-canonical amino acids, including ones that do not exist in nature. This opens a treasure chest of entirely new protein structures and functionalities. Despite the great potential for biotechnology and medicine, installing a new genetic code, especially in human cells, still bears major challenges. Our laboratory, among others, strives to develop new strategies to expand the genetic code inside of living cells and, at the same time, maintain their native genetic code. Come and join us for this brief journey through the milestones of protein engineering, up to the latest breakthroughs in encoding non-natural chemical properties, and you may get as excited as we are about where protein engineering is heading in the upcoming years.
We are living in exciting times – in the age of synthetic biology. Discoveries and technology developments in the 20th century laid the cornerstone for a biological revolution. At the heart of this revolution, the field of protein engineering is thriving at a pace higher than ever. Scientists can nowadays not only identify, quantify and functionally analyse nucleic acids and proteins in high throughput, but with the help of gene cloning, editing and synthesis methods, proteins can be tinkered on demand. Any protein? Well…
Shaping protein structure and function
Proteins serve as building blocks for almost every cellular structure and as bio-catalysts for the essential processes of life. They consist of amino acids sequentially joined by peptide bonds. The peptide chain forms a three-dimensional structure depending on the chemical properties of its amino acids, such as their charge and capability to form hydrogen bonds, hydrophobic and van der Waals interactions. A protein’s structure or fold determines many of its functions, such as the ability to bind other molecules with high affinity or the formation of stiff cellular elements, including those shaping the cytoskeleton. Notably, recent discoveries showed that intrinsically disordered proteins, which are largely unstructured or unfolded, are equally important for defining protein functionality and beautifully complement the properties of folded ones. Such natively unfolded proteins or protein domains provide not only the flexibility needed for movement and dynamic processes but also multi-valency; the ability to form different kinds of low-affinity interactions with proteins, RNAs, as well as with DNA and a plethora of other molecules. As such, proper function of intrinsically disordered proteins is highly relevant to healthy aging while dysfunction plays important roles in e.g. many cancers and neurodegenerative deseases. We are just beginning to understand the multifaceted features of intrinsically disordered proteins, including their tendency to phase separate into so-called condensates upon a triggering stimulus. These condensates, also referred to as membrane-less organelles, enrich defined components and deplete others, creating microenvironments that can themselves be engineered for specialized roles. This concept is key for purposing whole intracellular machineries to perform dedicated reactions, illustrating the impact of the just emerging field of engineering intrinsically disordered proteins.
Mutations in a gene can change the encoded amino acid sequence to render a cellular reaction more or less efficient. Common challenges in protein engineering are, e.g., to increase the conversion rate of an enzyme, broaden its substrate spectrum or change its reactivity entirely. Sometimes this can already be achieved by a single amino acid substitution at the catalytic centre, the place where a substrate is recognized and processed. However, if the aim is to enhance the thermal stability and activity of the same enzyme, the strategy will instead demand multiple mutations all over the protein sequence. Traditional engineering approaches are based on the random introduction of mutations, followed by screening for a desired outcome. In contrast, modern protein design makes use of three-dimensional structures obtained through nuclear magnetic resonance, electron tomorgaphy, protein crystallography and x-ray scattering or structure predictions from computational modelling, to make informed mutations towards a particular protein fold and function.
Cracking the genetic code
After the discovery of DNA by Friedrich Miescher in 1869, the acidic polymer isolated from the nucleus received only minor attention from scientists compared to proteins. This suddenly changed in 1944, when DNA was identified as the molecule that inherits genetic information – and the blueprint for making proteins. In 1961, Francis Crick and co-workers cracked the genetic code, revealing that 64 triplets of DNA bases encode all canonical amino acids, including the three so-called stop codons to terminate a completed protein sequence (Figure 1). The precise editing of DNA, e.g., with the help of restriction enzymes or CRISPR/Cas nucleases, can be used to engineer the encoded protein. And the genetic code provides the roadmap for design. Indeed, scientists can nowadays engineer almost any protein based on the 20 canonical amino acids, adhering to the genetic code. However, equipping proteins with non-canonical amino acids remains a difficult quest, as it requires a new, expanded genetic code to be installed within living cells.
The principles of expanding the genetic code
It is fascinating that only 20 generic building blocks give rise to the tremendous diversity of protein structures and functions. How would increasing this set by a handful, tens or even hundreds more amino acids, each bringing different physical and chemical properties, increase the diversity of protein architectures? At least for now, this is pure imagination, or is it? There are several hundred non-canonical amino acids, naturally produced in diverse organisms, but simply not used by ribosomes for building proteins. Notably, many organisms have specialized non-ribosomal synthetases that can form peptide bonds between non-canonical amino acids and are thus able to form highly complex peptides, including pigments, antibiotics and potent drugs. However, these synthetase enzymes generally consist of many subunits. Each of the subunits catalyses only a single, defined linkage of two particular amino acid residues, making it rather difficult to exploit their role in making designer proteins. Numerous totally artificial, unnatural amino acids can further be synthesized in a chemical laboratory. Using genetic code expansion, the ribosome can be programmed to execute an altered genetic code in living cells to incorporate such non-canonical amino acids into a defined protein at a defined site, allowing for the generation of protein structures and functions yet unexplored by nature. The strategy was first proposed by Peter G. Schultz and typically provokes the purposeful misrecognition of a stop codon, a base triplet that usually terminates the production of a protein. This stop codon is reassigned to instead encode for a non-canonical amino acid. To achieve this, three critical components have to be made in or given to living cells:
A messenger RNA (mRNA) with a chosen stop codon at the place where the non-canonical amino acid shall be encoded, and another, different stop codon to terminate translation
A ‘suppressor’ transfer RNA (tRNA) that recognizes this stop codon and thus reprograms it to a new sense codon by supplying the non-canonical amino acid
A tRNA synthetase enzyme that loads the aforementioned tRNA with the non-canonical amino acid that shall be incorporated in place of the chosen stop codon
Extending the genetic code: a multi-layered challenge
In the last two decades, more than 300 non-canonical amino acids have been installed within proteins of diverse cell types. Their great spectrum of physical and chemical properties enables a range of applications beyond the traditional fields of enhancing enzyme catalysis and rewiring protein properties, structures and interactions (Figure 2). Importantly, non-canonical amino acids can be made compatible with click chemistry, which allows the attachment of diverse molecules that were modified with a complementary click residue. Click chemistry was awarded the Nobel Prize in 2022 as once a click handle is installed, a sheer endless number of compatible handles can be connected. Those who played with LEGO™ get an idea of how much diversity can be created if two different building blocks are ‘clicked’ together. A powerful application is the conjugation of fluorescent dyes to specific protein sites. This approach has a huge potential for super-resolution microscopy, which requires tiny and bright markers for resolving small structures. Another area is the coupling of potent anti-cancer drugs to antibodies, combining the best of the worlds of chemistry and biology and enabling for the potent killing of tumour cells with the high specificity that monoclonal antibodies offer.
However, despite the promises that genetic code expansion holds, many applications today are still focused on research and tool development. Genetic code expansion systems are facing fundamental problems that need to be solved in order to unleash their power. Ample genetic code expansion will require an efficient machinery that is minimally invasive to the host to enable high yields of the desired synthetic protein. However, cells will always depend on maintaining their native genetic code. Components of the expanded genetic code are prone to cross-react with those of the native code, as observed when naturally occurring stop codons in the transcriptome get reprogrammed or when the wrong tRNA is charged with the wrong amino acid. This leads not only to a decreased efficiency of executing both codes but also frequently to the accumulation of toxic side products. Ideally, the expanded genetic code is orthogonal, meaning that it does not influence any other cellular process or vice versa. The suppressor tRNA supplying the non-canonical amino acid should only act at the chosen stop codon in the target mRNA, without affecting stop codons in other transcripts. The synthetase enzyme should only recognize its cognate tRNA for charging it with the non-canonical amino acid, disregarding all other tRNAs in the cell. Only a few pairs of a tRNA synthetase plus cognate tRNA are known to be orthogonal in human cells, which restrict the number of non-canonical amino acids that can be encoded at the same time. Moreover, genetic code expansion typically relies on the re-interpretation of a stop codon. Since only three stop codons are available, the options for recoding are limited.
Pushing the limits of genetic code expansion using artificial designer organelles
Several new approaches have been developed to address these challenges. They range from large efforts in synthesizing entire genomes from scratch (to free stop codons for recoding) to the elegant evolution of artificial ribosomes with a quadruplet genetic code (to generate novel codons). Although beautifully working in bacteria, these methods are very hard to realize in higher organisms, including human cells, due to their typically larger genome size and high cellular complexity. Our laboratory has recently developed orthogonally translating, membrane-less organelles (Figure 3). These synthetic organelles are inspired by concepts of phase separation and can be anchored to subcellular structures, such as the plasma membrane or the mitochondrion. Like in salad dressing, where oil droplets form in the vinegar or vice versa, proteins and other biomolecules can also phase separate into highly concentrated droplets. The formation of many widely known organelles, such as the nucleolus and stress granules, is influenced by such mechanisms, and this ultimately results in a dense or concentrated ‘phase’ that can provide a different chemical environment than elsewhere in the cell. Our synthetic designer organelles enrich essential components of the translational machinery, including the tRNA synthetase and its cognate suppressor tRNA. Another key step involves the recruitment of a specific mRNA (with a stop codon at a defined site) to the organelle microenvironment, where it is selectively translated with an expanded genetic code as only there is it exposed to the suppressor tRNA. Multiple organelles with different location inside the cell can also be installed at the same time. A single stop codon can thus be reassigned more than once to encode different non-canonical amino acid species in different microenvironments to install them into different proteins. Organelle architectures can be changed in order to vary their location or phase separation propensity and we are only at the beginning of understanding how they work and how far their potential can be pushed to engineer new functions into cells, without interfering with the ‘normal’ functions of the cell.
Considering also the large research efforts of other groups, including progress in synthesizing entire genomes and the establishment of artificial nucleotide base pairs, we are thrilled for new developments the future holds for the engineering of synthetic protein polymers. Besides their numerous applications, such as in the form of novel therapeutic agents or for improving processes in biotechnology, they may also open an opportunity for us to explore alternative paths of evolution and to learn about the principles of life on earth and elsewhere in general.
Further reading references
de la Torre, D. and Chin, J.W. (2021) Reprogramming the genetic code. Nat. Rev. Genet., 22, 169–184. doi: 10.1038/s41576-020-00307-7
Manandhar, M., Chun, E. and Romesberg, F.E. (2021) Genetic code expansion: inception, development, commercialization. J. Am. Chem. Soc., 143, 4859–4878. doi: 10.1021/jacs.0c11938
Krauskopf, K., and Lang, K. (2020) Increasing the chemical space of proteins in living cells via genetic code expansion. Curr. Opin. Chem. Biol., 58, 112–120. doi: 10.1016/j.cbpa.2020.07.012
Reinkemeier, C.D., Girona, G.E. and Lemke, E.A. (2019) Designer membraneless organelles enable codon reassignment of selected mRNAs in eukaryotes. Science, 363, eaaw2644. doi: 10.1126/science.aaw2644
Reinkemeier, C.D. and Lemke, E.A. (2021) Dual film-like organelles enable spatial separation of orthogonal eukaryotic translation. Cell, 184, 4886–4903. doi: 10.1016/j.cell.2021.08.001
Reinkemeier, C.D. and Lemke, E.A. (2021) Synthetic biomolecular condensates to engineer eukaryotic cells. Curr. Opin. Chem. Biol., 64, 174–181. doi: 10.1016/j.cbpa.2021.08.005
Saleh, A.M., Wilding, K.M., Calve, S., et al. (2019) Non-canonical amino acid labeling in proteomics and biotechnology. J. Biol. Eng., 13, 1–14. doi: 10.1186/s13036-019-0166-3
Romesberg, F.E. (2022) Creation, optimization, and use of semi-synthetic organisms that store and retrieve increased genetic information. J. Mol. Biol., 434, 167331. doi: 10.1016/j.jmb.2021.167331
Schultz, P. (2023). Expanding the genetic code. Protein Sci., 32, e4488. doi: 10.1002/pro.4488
Shandell, M.A., Tan, Z. and Cornish, V.W. (2021) Genetic code expansion: a brief history and perspective. Biochemistry, 60, 3455–3469. doi: 10.1021/acs.biochem.1c00286
Tang, H., Zhang, P. and Luo, X. (2022) Recent technologies for genetic code expansion and their implications on synthetic biology applications. J. Mol. Biol., 434, 167382. doi: 10.1016/j.jmb.2021.167382
Wang, L., Brock A., Herberich B. and Schultz, P.G. (2001) Expanding the genetic code of Escherichia coli. Science,292, 498–500. doi: 10.1126/science.1060077
Yu, M., Heidari, M., Mikhaleva, S. et al. (2023) Visualizing the disordered nuclear transport machinery in situ. Nature, 617, 162–169. doi: 10.1038/s41586-023-05990-0
We thank Nike Heinß for dedicating her time and talent in the art illustration. The Lemke Lab acknowledges funding by the ERC ADG MultiOrganelle Design, VW Life and the CRC1551 ‘Polymer concepts in cellular function’ of the Deutsche Forschungsgemeinschaft (DFG project number 464588647).
After graduating from University of Kaiserslautern, Cosimo earned a PhD from the European Molecular Biology Laboratory (EMBL) in Heidelberg and ETH Zurich. For his work on establishing CRISPRa/i screens in baker’s yeast to probe gene functions and signalling pathways, Cosimo was awarded with a fellowship for interdisciplinary research by the Joachim Herz Foundation. He recently joined the Institute of Molecular Biology Postdoc Programme and the Johannes Gutenberg University in Mainz, striving to develop artificial membrane-less organelles for executing defined cellular reactions and processes including genetic code expansion. Email: email@example.com.
After several years as group leader of an Emmy Noether and ERC consolidator research group at the European Molecular Biology Laboratory (EMBL), the biophysical chemist Edward A. Lemke has taken up a professorship for synthetic biophysics of protein disorder at Johannes Gutenberg University Mainz, where he has also become Adjunct Director at the Institute of Molecular Biology (IMB). Combining new research methods and expertise in synthetic biology, synthetic genomics, chemistry, biophysics and cell biology, his group innovates new approaches to studying the plasticity of intrinsically disordered proteins and their role in cellular proteostasis. He is the spokesperson of networks that bring together scientists from physical, chemical and life sciences (www.crc1551.com and www.spp2191.com) and his current work is also funded by an ERC Advanced grant. Email: firstname.lastname@example.org. Twitter: https://twitter.com/lemkelab