Synthetic biology routes to bio-artificial intelligence

Pinheiro, Vitor B.; Nesbeth, Darren N.; Zaikin, Alexey; Saka, Yasushi; Romano, M. Carmen; Giuraniuc, Claudiu V.; Kanakov, Oleg; Laptyeva, Tetyana

doi:10.1042/EBC20160014

The design of synthetic gene networks (SGNs) has advanced to the extent that novel genetic circuits are now being tested for their ability to recapitulate archetypal learning behaviours first defined in the fields of machine and animal learning. Here, we discuss the biological implementation of a perceptron algorithm for linear classification of input data. An expansion of this biological design that encompasses cellular ‘teachers’ and ‘students’ is also examined. We also discuss implementation of Pavlovian associative learning using SGNs and present an example of such a scheme and in silico simulation of its performance. In addition to designed SGNs, we also consider the option to establish conditions in which a population of SGNs can evolve diversity in order to better contend with complex input data. Finally, we compare recent ethical concerns in the field of artificial intelligence (AI) and the future challenges raised by bio-artificial intelligence (BI).

Introduction

Artificial intelligence (AI) can be defined as the decision-making capabilities of machines [1]. Machines are most commonly regarded as designed, multi-part objects that perform predetermined mechanical tasks. To date all commercial machines are constructed using electronics directed by sets of instructions (algorithms) encoded within circuits patterned onto semiconductor materials such as silicon. Machine learning can occur when the algorithms that control the machine are written such that the algorithms themselves can independently use prior data sets to inform future decisions.

Human learning, by contrast, is understood to be a phenomenon that emerges in part from the dynamic and adaptive exchange of information between neurons in the brain and within individual neuronal cells. Individual cells can adapt to and anticipate environmental signals, such as the onset of stress and the availability of nutrients. For example, cells of the mammalian immune system can acquire memory of previous pathogen invasions and prepare for future infections. In experiments, the slime mould Physarum polycephalum, a single-celled organism, found the shortest path between two points in a labyrinth [2,3], and anticipated future events that it had previously experienced on a periodic basis [4].

Networks of genes [5] and enzymes [6] have been described in terms of their ability to support adaptive behaviours. However, research is ongoing into the network topologies and behaviours that underlie single cell learning. A key question is whether learning in single cells occurs in a manner analogous to multicellular systems, or with an architecture that is predetermined by genetically encoded programs. The development of synthetic biology in recent years provides a novel avenue to address this question from a biological engineering perspective.

Association and classification of external stimuli are two fundamental concepts used to define learning in the field of AI. A number of theoretical studies indicate that single cells can exhibit these types of learning [7–9]. Cell-free biological systems have also been established which exploit DNA strand hybridisation and displacement to perform neural network computations [10]. To date, however, no artificial single-cell-based learning system has been realised experimentally. Rapid development of synthetic biology in the previous decade has now made engineering of such a system feasible.

In this review, we discuss a selection of synthetic biologists’ efforts to design, model, build and test synthetic gene networks (SGNs) that enable living cells to associate and classify external stimuli. In doing so we hope to stimulate researchers to consider and debate how synthetic biology could be used to implement AI using biological material as an alternative to the silicon, metal and plastic materials used in conventional AI.

While mathematical models were applied in the development and analysis of the SGNs discussed here, this review focuses on the biological aspects of SGN. As such, a complete description of the relevant models is not necessary to understand the concepts presented here. Readers who do wish to examine the mathematical models further should refer to the cited literature and reviews by Bates et al. [11] and Borg et al. [12]. Specific technical details can be provided via the corresponding author.

Supervised learning in artificial intelligence: students, teachers and classification

A key goal of machine learning is the development of algorithms that can infer a set of rules from a predetermined ‘training’ data set. Once the training data have been analysed, the algorithm should ideally be able to correctly sort previously unseen data sets into correct categories [13] in what is termed ‘supervised learning’. One mode of this sorting, also known as ‘classification’, is to classify all data inputs into one of two states–for instance being above or below a given linear threshold. This type of supervised learning is known as linear classification and a number of algorithms have been developed to achieve this task. The perceptron is one of the earliest linear classification algorithms and has been used to identify translation initiation sites in Escherichia coli mRNA molecules [14]. In a perceptron algorithm, a given input signal is classed as being above or below a line (or threshold). The position of this threshold is altered as part of the learning process until all data points have been successfully classified as being above or below a linear threshold. Figure 1 sets out a scheme for biological implementation of a perceptron in which a toggle switch (Figure 3A) classifies the sum of two input signals being one side or another of a given threshold, resulting in expression of either RFP or GFP. The position of the threshold is determined by a central element, ‘node 0’. The nodes in this context represent one or more genes that function to repress or stimulate other nodes.

A synthetic gene network for linear classification

Figure 1

A linear classifier phenotype can be achieved with a SGN comprising five nodes, depicted in the diagram as circles labelled 0, 1, 2, 3 and 4. Arrowhead connectors indicate activation of one node by another, hammerhead connectors indicate inhibition. Nodes 3 and 4 represent a toggle switch, which can flip between the state of ‘3 ON, 4 OFF’ and the state of ‘3 OFF, 4 ON’. Nodes 3 and 4 repress each other. Node 0 favours the ‘4 ON’ state and inhibits the ‘3 ON’ state. Nodes 1 and 2 represent inputs that favour ‘3 ON’ and inhibit ‘4 ON’. The output position of the 3/4 toggle switch is tipped toward ‘3 ON’ or ‘4 ON’ depending on the net activity level of nodes 1 and 2. In effect the 3/4 toggle switch classifies inputs 1 and 2. Node 0 can be used to tip the equilibrium of the toggle switch toward ‘3 ON’. This impacts how the output position of the toggle switch is influenced by nodes 1 and 2. In this way, the weighting of the classification threshold can be set by the activity of node 0. This scheme is proposed here by A.Z.

View large Download slide

A synthetic gene network for linear classification

A linear classifier phenotype can be achieved with a SGN comprising five nodes, depicted in the diagram as circles labelled 0, 1, 2, 3 and 4. Arrowhead connectors indicate activation of one node by another, hammerhead connectors indicate inhibition. Nodes 3 and 4 represent a toggle switch, which can flip between the state of ‘3 ON, 4 OFF’ and the state of ‘3 OFF, 4 ON’. Nodes 3 and 4 repress each other. Node 0 favours the ‘4 ON’ state and inhibits the ‘3 ON’ state. Nodes 1 and 2 represent inputs that favour ‘3 ON’ and inhibit ‘4 ON’. The output position of the 3/4 toggle switch is tipped toward ‘3 ON’ or ‘4 ON’ depending on the net activity level of nodes 1 and 2. In effect the 3/4 toggle switch classifies inputs 1 and 2. Node 0 can be used to tip the equilibrium of the toggle switch toward ‘3 ON’. This impacts how the output position of the toggle switch is influenced by nodes 1 and 2. In this way, the weighting of the classification threshold can be set by the activity of node 0. This scheme is proposed here by A.Z.

Supervised learning in synthetic biology: student cells and teacher cells

Algorithms and mathematical models for perceptron-based supervised learning can encompass a ‘teacher’ element that provides data sets and determines responses to those data, and a ‘student’ element, whose learning is directed by the teacher [15]. The biological student–teacher (BST) network consists of sets of genes within teacher and student cells that interact via promoting or repressing outputs. Taken individually, each network can be considered as a switch, with either RFP or GFP output as an indirect response to levels of a small molecule that can traverse cell membranes (Figure 2A). The classification threshold of the teacher can be adjusted externally by designing the O^T node to be influenced by an inducer molecule such as isopropyl β-D-1-thiogalactopyranoside (IPTG). The O^S node classification threshold in student cells would be set by the level of a second small molecule inducer, not IPTG, the concentration of which is influenced most strongly by teacher cells. In this way, students are effectively ‘taught’ the position of the classification threshold to use by the teachers.

Linear classification with a biological student–teacher network

Figure 2

(A) Teacher and student cells both contain SGNs encoding the five nodes described in Figure 1, but labelled here as 0, G1, G2, G3 and G4. Node 0 for a teacher cell is labelled 0T and node 0 for a student cell is labelled 0S. As in Figure 1, nodes G3 and G4 comprise a toggle switch. The output position of the toggle switch is tipped toward G3, resulting in RFP expression or G4, resulting in GFP expression, depending on the net activity level of nodes G1 and G2. In effect the G3/G4 toggle switch classifies the activities of the G1 and G2 nodes as inputs. As in Figure 1, node 0 (0T or 0S) pushes the equilibrium of the toggle switch toward G3. Unlike in Figure 1, in this BST network, activity of 0T can be controlled exogenously by addition of a small molecule inducer to the growth medium. Furthermore, in addition to RFP, node G3 also directs expression of a small molecule that can traverse cell membranes and activate node 0S. This has the effect that, when teacher cells are in excess, the activity of 0S in student cells is set (‘learned’) by the level of signal produced by teacher cells. Arrowhead connectors indicate activation of one node by another and hammerhead connectors indicate inhibition. Curled arrowhead connectors indicate auto-induction. (B) Mathematical simulation of the BST network learning dynamics. Outputs of the student cells: red for RFP from G3, green for GFP from G4, are constantly ‘learned’ from changes in the teacher cells which determine the activity (threshold) of node 0S in the student cells. This scheme is proposed here by A.Z. and D.N. and the simulation was performed by C.G. and Y.S.

View large Download slide

Linear classification with a biological student–teacher network

(A) Teacher and student cells both contain SGNs encoding the five nodes described in Figure 1, but labelled here as 0, G1, G2, G3 and G4. Node 0 for a teacher cell is labelled 0^T and node 0 for a student cell is labelled 0^S. As in Figure 1, nodes G3 and G4 comprise a toggle switch. The output position of the toggle switch is tipped toward G3, resulting in RFP expression or G4, resulting in GFP expression, depending on the net activity level of nodes G1 and G2. In effect the G3/G4 toggle switch classifies the activities of the G1 and G2 nodes as inputs. As in Figure 1, node 0 (0^T or 0^S) pushes the equilibrium of the toggle switch toward G3. Unlike in Figure 1, in this BST network, activity of 0^T can be controlled exogenously by addition of a small molecule inducer to the growth medium. Furthermore, in addition to RFP, node G3 also directs expression of a small molecule that can traverse cell membranes and activate node 0^S. This has the effect that, when teacher cells are in excess, the activity of 0^S in student cells is set (‘learned’) by the level of signal produced by teacher cells. Arrowhead connectors indicate activation of one node by another and hammerhead connectors indicate inhibition. Curled arrowhead connectors indicate auto-induction. (B) Mathematical simulation of the BST network learning dynamics. Outputs of the student cells: red for RFP from G3, green for GFP from G4, are constantly ‘learned’ from changes in the teacher cells which determine the activity (threshold) of node 0^S in the student cells. This scheme is proposed here by A.Z. and D.N. and the simulation was performed by C.G. and Y.S.

Within industrial biotechnology this supervised learning could be used to optimise performance of a biotransformation step. For instance, in nature, material such as agricultural waste often consists of a diversity of substances that, collectively, are most commonly decomposed by consortia of different microbial species [16]. In conventional biotechnology, a lone species, typically E. coli, is engineered to express recombinant enzymes encoded by transgenes controlled by exogenous, strong, constitutive promoters, IPTG-inducible promoters or promoters present in a locus ported en bloc from another species. In future, synthetic consortia of different cell types could be designed in which a particular objective, or master instruction, would be set by controlling the classification threshold of teacher cells. Subsequent delivery of classification weighting instructions to different student cell types would be influenced by the biological status of teacher cells, providing a more dynamic and sensitive signalling. This particularly comes into play when consortia grow as 3D structures such as biofilms [17].

Mathematical modelling of a biological student–teacher network

Suzuki et al. [18] proposed a mathematical model of a network using ordinary differential equations that can be applied to the network proposed in Figure 2A. The model incorporated the ability to vary the levels of gene transcription ‘noise’ (unexplained variation) used in simulations of gene network behaviour. Solving the equations of the model numerically, using a range of biologically relevant parameter levels for factors such as transcriptional noise, demonstrated the sensitivity of the BST SGN when changing the threshold of the switch within the teacher (Figure 2B). The simulation also showed that a change in the teacher is followed by a change in the student after a short delay. However, comparison with experimental observations is necessary to robustly assess the validity of this simulation.

Associative learning

Association of two stimuli is perhaps most intuitively illustrated by the classic experiments of Pavlov [19], in which a dog learned to associate the ringing of a bell with feeding time. After simultaneous application of both stimuli, the dog learned to associate them, exhibiting the same response (salivation) to either of the two stimuli alone. Such classical associative learning is advantageous because it enables an organism to anticipate and adapt to environmental changes quickly and has been observed in all animals with bilateral symmetry so far studied [20].

Building an associative perceptron with synthetic gene networks

To perform learning tasks, cells must ‘remember’ past stimuli, and genetically encode the memory. Basic synthetic genetic memory circuits that achieve this task have been demonstrated previously, such as the genetic toggle switch depicted in Figure 3A [21] and the transcriptional positive feedback loop in Figure 3B [22]. Both of these circuits have two stable memory states dictated by the expression of the genes in the circuits.

Genetic memory circuits

Figure 3

(A) Genetic toggle switch. A sufficiently strong pulse of input 1 will overcome inhibition of expression of gene X caused by protein Y (Y in blue oval). Uninhibited expression of gene X will then continue as protein X (X in blue oval) also acts to inhibit expression of gene Y. Subsequently, the network can be flipped to the opposite position by a sufficiently strong pulse of input 2, which will overcome inhibition of expression of gene Y caused by protein X. Uninhibited expression of gene Y will then continue as protein Y also acts to inhibit expression of gene X. (B) Positive feedback loop circuit. Input 1 initiates expression of gene X. The resultant protein X then also induces express of gene X for sustained activity of the gene that will persist after the initial input 1 has ceased. Positive and negative regulations are indicated by arrows and hammerheads, respectively. These schemes have been proposed by several groups.

View large Download slide

Genetic memory circuits

(A) Genetic toggle switch. A sufficiently strong pulse of input 1 will overcome inhibition of expression of gene X caused by protein Y (Y in blue oval). Uninhibited expression of gene X will then continue as protein X (X in blue oval) also acts to inhibit expression of gene Y. Subsequently, the network can be flipped to the opposite position by a sufficiently strong pulse of input 2, which will overcome inhibition of expression of gene Y caused by protein X. Uninhibited expression of gene Y will then continue as protein Y also acts to inhibit expression of gene X. (B) Positive feedback loop circuit. Input 1 initiates expression of gene X. The resultant protein X then also induces express of gene X for sustained activity of the gene that will persist after the initial input 1 has ceased. Positive and negative regulations are indicated by arrows and hammerheads, respectively. These schemes have been proposed by several groups.

In the genetic toggle switch (Figure 3A), either gene X or gene Y is switched on due to their mutual repression. These two memory states can be flip-flopped by two different input signals. In the positive feedback circuit (Figure 3B), the expression of gene X is switched on by an input stimulus. Once activated, the ON state is self-sustaining due to positive feedback. SGNs for associative learning can be built based upon these memory circuits. Several groups have proposed such SGNs, including Lu et al. [23] who put forward an associative learning SGN based on a toggle switch.

Elegant systems in which memory states are defined within DNA sequences have also been demonstrated by Farzadfard and Lu [24] and Yang et al. [25], using recombinase-mediated flipping of segments of genomic DNA. These systems represent potentially powerful basic research tools for discovering the provenance of different cell types. For instance, determining the events experienced by a given cell type as it matures from stem cell to terminally differentiated cell.

For dynamic and rapid memory establishment and erasure, SGNs have been designed to be capable of associating two different environmental signals in a manner analogous to the animal learning behaviour revealed by Pavlov. One such SGN is based on the combination of a positive feedback loop memory circuit and a negative modifier (Figure 4). This ‘positive feedback/negative modified’ (PFNM) network has the important advantage that it requires only a transient signal to form a sustained memory.

A synthetic gene network for associative learning

Figure 4

(A) Schematic diagram of the PFNM associative learning network. Positive and negative regulations are indicated by arrows and hammerheads, respectively. Input 1 stimulates nodes u, v and y. Input 2 stimulates nodes w and y. (B) Simulation of the behaviour of the network. Either input 1 or 2 alone leads to a weak activation of the output y, at times t1 and t2. When both inputs 1 and 2 are applied simultaneously, a ‘memory’ is formed by a self-sustained expression of u due to its positive auto-regulation. Because of this memory a subsequent input 1 or input 2 alone can cause a strong induction of y. In this way the network has learned to associate inputs 1 and 2. This memory can be erased by a sufficiently large input 1 (due to the direct activation of v), bringing the system back to the default state. This scheme is proposed here by Y.S. and M.C.R. and the simulation was performed by Y.S.

View large Download slide

A synthetic gene network for associative learning

(A) Schematic diagram of the PFNM associative learning network. Positive and negative regulations are indicated by arrows and hammerheads, respectively. Input 1 stimulates nodes u, v and y. Input 2 stimulates nodes w and y. (B) Simulation of the behaviour of the network. Either input 1 or 2 alone leads to a weak activation of the output y, at times t1 and t2. When both inputs 1 and 2 are applied simultaneously, a ‘memory’ is formed by a self-sustained expression of u due to its positive auto-regulation. Because of this memory a subsequent input 1 or input 2 alone can cause a strong induction of y. In this way the network has learned to associate inputs 1 and 2. This memory can be erased by a sufficiently large input 1 (due to the direct activation of v), bringing the system back to the default state. This scheme is proposed here by Y.S. and M.C.R. and the simulation was performed by Y.S.

Memory erasure in the PFNM circuit would be achieved post-translationally via inducible protein degradation, using a system such as the auxin-inducible protein degradation [26,27]. Steps that are achieved post-translationally allow greater network responsiveness compared with steps that are mediated by transcriptional repression. The proposed network could be implemented experimentally using genetic tools that conform to the BioBrick™ synthetic biology standard, including a transcription activator, transcription repressor, fluorescent reporter protein and a small molecule regulator of protein degradation. A mathematical model, which applies four ordinary differential equations, activating and inhibiting Hill functions and mass-action law, can be used to assess the capacity of the PFNM circuit for associative learning. Simulation using this model predicted an initial low level of network response to pulses of either input 1 or input 2 when experienced separately (Figure 4A). The network was then subjected to a pulse of both input 1 and input 2 at the same time. After this double-input pulse had been detected, the network was then predicted to give a boosted level of response to separate pulses of either input 1 or input 2 (Figure 4B). In this way, the double-input pulse establishes a memory. This memory informs an increased level of response to single inputs relative to the level of response prior to when the memory was established.

Classification of complex inputs

Until now we have considered relatively simple classes of inputs of the type that can be separated by a single threshold and do not overlap. In these cases, the SGN merely classifies binary inputs that switch between the simple states such as being absent or present, or above or below a line. Biological reality, however, inevitably poses more complex situations. Classifying a more complex input, such as a concentration of a biological solute or signalling protein that falls within an upper and lower threshold, can also be addressed with SGN design (Figure 5A). Classification of a given two-input signature, for instance 10–20 nM of solute X and 600–800 nM solute Y, can be achieved with an ab initio designed SGN (Figure 5B) but begins to place a significant burden on the SGN designer (human or machine) to engineer or source sensor elements with the precise desired sensitivity to detect the two different solute concentration ranges. For example a given SGN design may require multiple promoters, each sensitive to different concentrations of the same, or different, solutes. In this situation, it is essential for the overall function of the network that there is no ‘cross-talk’ between the different inputs and the different promoters intended to be activated or repressed in response to those inputs. For instance, if solute A induces promoter A, but also induces promoters B, C and D unintentionally, the conditionality of outputs is compromised. As such, ‘orthogonal’ partners of inducer and promoter must be identified, in which a given inducer influences only a specific promoter type and has no effect on any other promoter. This orthogonality is a non-trivial objective for synthetic biologists [28] because it is arguably a defining feature of natural biology that genes within a genome tend to influence each other's expression [29].

SGNs to classify input data that are not linearly separable

Figure 5

(A) Sensing and response functionalities are split into separate modules. In the first module (sensor), an inducible promoter drives the expression of the transcription factor U in response to the concentration of a biological input X, such as a solute or signalling molecule. Above a certain level of X, the expression of U reaches a maximum and does not increase or decrease. In the second module (reporter), another inducible promoter drives the expression of a reporter (GFP) in response to induction by U. The promoter is activated by intermediate concentrations of U and inhibited by high concentrations of U. Thus, the resulting response function of the entire two-promoter circuit to the concentration of signalling molecule is bell shaped for the relevant values of the input signal. (B) In the case of two input ranges, X1 and X2, the sensor/output modules feed into an AND gate which sums the output signals as either the presence or absence of GFP expression [30,33]. Adapted with permission from Dydovik et al. [30] and Kanakov et al. [33].

View large Download slide

SGNs to classify input data that are not linearly separable

(A) Sensing and response functionalities are split into separate modules. In the first module (sensor), an inducible promoter drives the expression of the transcription factor U in response to the concentration of a biological input X, such as a solute or signalling molecule. Above a certain level of X, the expression of U reaches a maximum and does not increase or decrease. In the second module (reporter), another inducible promoter drives the expression of a reporter (GFP) in response to induction by U. The promoter is activated by intermediate concentrations of U and inhibited by high concentrations of U. Thus, the resulting response function of the entire two-promoter circuit to the concentration of signalling molecule is bell shaped for the relevant values of the input signal. (B) In the case of two input ranges, X₁ and X₂, the sensor/output modules feed into an AND gate which sums the output signals as either the presence or absence of GFP expression [30,33]. Adapted with permission from Dydovik et al. [30] and Kanakov et al. [33].

Ensembles of SGNs for classification of complex inputs

To meet these challenges, so-called ‘ensemble’ classifiers have been proposed [30,31]. The ensemble concept requires establishment of a heterogeneous population of simple classifier SGNs that encompasses a random distribution of sensitivities to input signals, each responding to only a narrow range of input levels. The overall output signal is the sum of the outputs of each SGN of the population and so can be considered as a tuneable collective response.

The SGN set out in Figure 5B features distinct ribosome binding site (RBS) elements, RBS_U1 and RBS_U2, which respond to a distinct concentration of their cognate input molecules, X₁ and X₂, respectively. High throughput (HTP) mutation approaches could be readily applied to generate a diverse library of RBS variants from RBS_U1 to RBS_Un. Once each variant has been introduced into cells, a population is generated harbouring an ensemble of SGNs with different input sensitivities. Across the ensemble population expression of the reporter would produce a bell-shaped response curve. The randomised sensitivity of the sensor RBS within each SGN of the ensemble is key. This distribution of sensitivities controls the position of the maximal signal output produced in response the concentration of a chemical input signal.

The ensemble of SGNs could be trained by selective deletion of the cells hosting SGNs that produce an incorrect response to positive or negative control signals. Total ensemble size, in terms of cell numbers, can be maintained by addition of new cells or by proliferation of the remaining non-deleted cells. Furthermore, probabilistic deletion, whereby incorrectly responding cells would have a finite probability of persisting within the ensemble population, would enable the ‘soft learning’ required for classification of input signals that have regions of overlap.

The sharp bell-shaped output of single synthetic circuits makes it possible to meet the challenge of distinguishing input classes that have a complex structure in the signal space. Effectively, training reshapes the distribution of individual sensitivities in the population, allowing them to cover the signal subspace corresponding to one of the classes by a union of ‘pixel’ responses. As a result, the SGN ensemble can be trained to classify inputs that are not linearly separable (Figure 6).

Simulation of an ensemble SGN soft learning how to classify overlapping input signals

Figure 6

(A) The signals from two inputs, X1 and X2, overlap and therefore result in production of an overlapping output (red region) from an untrained ensemble SGN population. (B) After such a population has undergone loss of certain cells (indicated by white dots) due to selection pressure, mathematical modelling by Kanakov et al. [33] predicts that a classification border (within black and white dashed line) will emerge in respect to the output signal from the remaining cells (black dots). These remaining cells, and the ensemble of SGNs they harbour, can be considered as a ‘trained classifier’, which has undergone ‘soft learning’. The colour code of the heat map indicate relative change in response of the ensemble classifier, in arbitrary units. Adapted with permission from Kanakov et al. [33].

View large Download slide

Simulation of an ensemble SGN soft learning how to classify overlapping input signals

(A) The signals from two inputs, X₁ and X₂, overlap and therefore result in production of an overlapping output (red region) from an untrained ensemble SGN population. (B) After such a population has undergone loss of certain cells (indicated by white dots) due to selection pressure, mathematical modelling by Kanakov et al. [33] predicts that a classification border (within black and white dashed line) will emerge in respect to the output signal from the remaining cells (black dots). These remaining cells, and the ensemble of SGNs they harbour, can be considered as a ‘trained classifier’, which has undergone ‘soft learning’. The colour code of the heat map indicate relative change in response of the ensemble classifier, in arbitrary units. Adapted with permission from Kanakov et al. [33].

As SGN size and complexity increases so challenges in biological implementation also tend to increase, such as the availability of orthogonal genetic constructs. Excellent work by Nielsen et al. [32] demonstrated a robust system, Cello, for design and assembly of up to 45 SGNs with intended function. For ensemble SGNs (Figures 5 and 6), selection and deletion of variants can be performed to ensure cross-reaction of inputs does not dampen the collective response. As such in future the design space for large-scale SGNs may be limited only by the functional diversity of possible SGN component sequences and the metabolic capacity of a chosen cell type to replicate and express hosted SGNs.

Intercellular communication between synthetic gene networks

Further sophistication in ensemble SGN design is likely to be achieved by integration with engineered intercellular communication. A study by Kanakov et al. [31] demonstrated that quorum sensing could be used to coordinate the function of designed genetic elements that have been distributed across different sub-groups of cells. They showed that toggle switch and oscillator functions could arise from these distributed, coordinated SGNs in a predictable and controllable manner. These distributed, coordinated SGNs were sensitive to modulation by external chemical signalling and the growth dynamics of the host cell population. This opens exciting possibilities for implementing dynamical decision making using distributed SGNs. Terrel et al. [34] also took a major step toward experimental implementation of distributed SGNs capable of classification. They demonstrated a system in which the presence of input was reported by a nanoparticle binding event that could occur only when two different cell types detected input signal.

Applications and implications of bio-artificial intelligence

Synthetic biology has the potential to disruptively reconfigure goods and services that are today bio-based, such as vaccines, and those that are today mainly non-biological, such as sensor devices and computation [35]. The ‘designed’ biology envisioned by this approach will remain only a vision until basic research enables engineers to build and test sophisticated biological devices that perform predictably within parameters accurately described by mathematical modelling [35]. A major challenge for this vision is to build well-characterised, SGNs that go beyond discrete functions (sensing, oscillating) to incorporate the learning SGNs discussed here and networks of networks that provide new functions in cells and consortia of multiple cell types.

The steadily increasing number of advances in DNA synthesis and assembly make combinatorial assembly of large SGNs accessible and practical. Generating large libraries of DNA fragments of differing sequence allows selection of variants that function well and deletion of variants that perform poorly–in effect an evolutionary approach. Furthermore, modular assembly of large DNA molecules allows specific subsections to be removed and replaced with different variants, while the rest of the molecule is unchanged. Together these approaches mean evolutionary strategies can be used to find optimal solutions to circuit design, while modular approaches are used for debugging and error correction.

For example, multiple fragments composing a biosynthetic or signalling pathway can be assembled using a variety of methods for parallel ligation of multiple DNA fragments. Readers interested in detailed discussion of these DNA assembly methods should consult reports by Engler et al. [36] of the ‘MoClo’ method and by Weber et al. [37] of the ‘Golden Gate’ assembly method. Several methods have also been developed specifically for manipulation of very large (100 kilo base pairs and larger) DNA fragments [38–40]. Methods such as these have ultimately enabled assembly of entire bacterial and eukaryotic chromosomes [41,42].

Possible industrial applications of SGNs that can learn include designing cells that can respond to large, small, intended and unintended perturbations in bioprocess environments while maintaining optimal productivity, such as biotherapeutic production, resource utilisation or biosynthesis of high value chemicals. Smart cells that can respond to the physiological status of the patient in a sophisticated manner could also expand the application and robustness of whole cell therapeutic approaches.

Advances in conventional AI have raised concerns around the use of AI technologies in ways that would not be acceptable to wider society. Examples include the use of voice recognition in public spaces for surveillance purposes or deploying autonomous robots to work as counsellors, soldiers, carers or judges [43]. Bio-artificial intelligence (BI) could enable pheromone recognition or detection of a person's unique signature of volatile biological molecules. Of course these are purely long-term considerations, but we suggest it is prudent to monitor development in the field of AI as an indicator of the possible challenges BI might pose in future. A recent example of such precautionary oversight is the appointment of an ethics board at AI company Lucid (Austin, Texas, USA).

To date no reports exist of the application of SGNs in a commercial biomanufacturing process. As such, the current boundaries of synthetic biology must be pushed in order to deliver enhanced capabilities and a new era of ‘intelligent bio-manufacturing’. This might include deployment of ‘smart’ cells that can adapt to dynamic changes in their production (e.g. bioreactor) or application (e.g. organ, tissue) environments. As the global synthetic biology market grows, developing such capabilities will become a key challenge that will require the development of techniques across an increasingly broad palette of SGN architectures.

Summary

The classic Perceptron function for linear classification of inputs could in theory be implemented using ‘Teacher’ and ‘Student’ SGNs.
SGNs have been designed to perform Pavlovian associative learning.
Simulations in silico have provided preliminary confirmation that Pavlovian associative learning and Perceptron-based linear classification could be encoded in SGNs.
SGNs and experimental schemes have been proposed that could be capable of evolving increased levels of diversity, enabling classification of complex input data.
In future ‘bio-artificial intelligence’ may eventually pose ethical concerns that parallel those raised by recent developments in conventional artificial intelligence.

Abbreviations

AI: artificial intelligence
BI: bio-artificial intelligence
BST: biological student–teacher
IPTG: isopropyl β-D-1-thiogalactopyranoside
PFNM: positive feedback/negative modified
RBS: ribosome binding site
SGN: synthetic gene network

Funding

T.L. and O.K. acknowledge Russian Science Foundation Project No. 14-12-00811: ‘Theory and numerics for ensembles of gene networks’. A.Z. acknowledges the Russian Science Foundation (grant number 16-12-00077). D.N.N. acknowledges funding from BBSRC grant BB/M004880/1.

Competing Interests

The authors declare that there are no competing interests associated with the manuscript.

References

1

Ghahramani

Z.

Probabilistic machine learning and artificial intelligence

,

Nature

,

2015

, vol.

521

(pg.

452

-

459

)

https://doi.org/10.1038/nature14541

Google Scholar

Crossref

PubMed

2

Nakagaki

T.

,

Yamada

H.

,

Toth

A.

Maze-solving by an amoeboid organism

,

Nature

,

2000

, vol.

407

pg.

470

https://doi.org/10.1038/35035159

Google Scholar

Crossref

PubMed

3

Tero

A.

,

Takagi

S.

,

Saigusa

T.

,

Ito

K.

,

Bebber

D.P.

,

Fricker

M.D.

, et al

Rules for biologically inspired adaptive network design

,

Science

,

2010

, vol.

327

(pg.

439

-

442

)

https://doi.org/10.1126/science.1177894

Google Scholar

Crossref

PubMed

4

Saigusa

T.

,

Tero

A.

,

Nakagaki

T.

,

Kuramoto

Y.

Amoebae anticipate periodic events

,

Phys. Rev. Lett.

,

2008

, vol.

100

pg.

018101

https://doi.org/10.1103/PhysRevLett.100.018101

Google Scholar

Crossref

PubMed

5

Mangan

S.

,

Alon

U.

Structure and function of the feed-forward loop network motif

,

Proc. Natl. Acad. Sci.

,

2003

, vol.

100

(pg.

11980

-

11985

)

https://doi.org/10.1073/pnas.2133841100

Google Scholar

Crossref

6

Ma

W.

,

Trusina

A.

,

El-Samad

H.

,

Lim

W.A.

,

Tang

C.

Defining network topologies that can achieve biochemical adaptation

,

Cell

,

2009

, vol.

138

(pg.

760

-

773

)

https://doi.org/10.1016/j.cell.2009.06.013

Google Scholar

Crossref

PubMed

7

Jones

B.

,

Stekelo

D.

,

Rowe

J.

,

Fernando

C.

Is there a liquid state machine in the bacterium Escherichia coli?

,

Proceedings of IEEE Symposium on Artificial Life

,

2007

Honolulu, Hawaii

(pg.

187

-

191

)

Google Scholar

8

Gandhi

N.

,

Ashkenasy

G.

,

Tannenbaum

E.

Associative learning in biochemical networks

,

J. Theor. Biol.

,

2007

, vol.

249

(pg.

58

-

66

)

https://doi.org/10.1016/j.jtbi.2007.07.004

Google Scholar

Crossref

PubMed

9

Fernando

C.T.

,

Liekens

A.M.

,

Bingle

L.E.

,

Beck

C.

,

Lenser

T.

,

Stekel

D.J.

, et al

Molecular circuits for associative learning in single-celled organisms

,

J. R. Soc. Interface

,

2009

, vol.

6

(pg.

463

-

469

)

https://doi.org/10.1098/rsif.2008.0344

Google Scholar

Crossref

PubMed

10

Qian

L.

,

Winfree

E.

,

Bruck

J.

Neural network computation with DNA strand displacement cascades

,

Nature

,

2011

, vol.

475

(pg.

368

-

372

)

https://doi.org/10.1038/nature10262

Google Scholar

Crossref

PubMed

11

Bates

R.

,

Blyuss

O.

,

Alsaedi

A.

,

Zaikin

A.

Effect of noise in intelligent cellular decision making

,

PLoS One

,

2015

, vol.

10

pg.

e0125079

https://doi.org/10.1371/journal.pone.0125079

Google Scholar

Crossref

PubMed

12

Borg

Y.

,

Ullner

E.

,

Alagha

A.

,

Alsaedi

A.

,

Nesbeth

D.N.

,

Zaikin

A.

Complex and unexpected dynamics in simple genetic regulatory networks

,

Int. J. Mod. Phys. B

,

2014

, vol.

28

pg.

1430006

https://doi.org/10.1142/S0217979214300060

Google Scholar

Crossref

13

Baştanlar

Y.

,

Ozuysal

M.

Introduction to machine learning

,

Methods Mol. Biol.

,

2014

, vol.

1107

(pg.

105

-

128

)

https://doi.org/10.1007/978-1-62703-748-8

Google Scholar

Crossref

PubMed

14

Stormo

G.D.

,

Schneider

T.D.

,

Gold

L.

,

Ehrenfeucht

A.

Use of the ‘Perceptron’ algorithm to distinguish translational initiation sites in E. coli

,

Nucleic Acids Res.

,

1982

, vol.

10

(pg.

2997

-

3011

)

https://doi.org/10.1093/nar/10.9.2997

Google Scholar

Crossref

PubMed

15

Rosen-Zvi

M.

On-line learning in the Ising perceptron

,

J. Phys. A: Math. Gen.

,

2000

, vol.

33

(pg.

7277

-

7287

)

https://doi.org/10.1088/0305-4470/33/41/302

Google Scholar

Crossref

16

Bernstein

H.C.

,

Carlson

R.P.

Microbial consortia engineering for cellular factories: in vitro to in silico systems

,

Comput. Struct. Biotechnol. J.

,

2012

, vol.

3

pg.

e201210017

Google Scholar

Crossref

PubMed

17

Perry

N.

,

Nelson

E.M.

,

Timp

G.

Wiring together synthetic bacterial consortia to create a biological integrated circuit

,

ACS Synth. Biol.

,

2016

https://doi.org/10.1021/acssynbio.6b00002

Google Scholar

18

Suzuki

N.

,

Furusawa

C.

,

Kaneko

K.

Oscillatory protein expression dynamics endows stem cells with robust differentiation potential

,

PLoS One

,

2011

, vol.

6

pg.

e27232

https://doi.org/10.1371/journal.pone.0027232

Google Scholar

Crossref

PubMed

19

Pavlov

I.P.

,

Conditioned Reflexes: An Investigation of the Physiological Activity of the Cerebral Cortex (translated by G.V. Anrep)

,

1927

London

Oxford University Press

Google Scholar

20

Ginsburg

S.

,

Jablonka

E.

The evolution of associative learning: a factor in the Cambrian explosion

,

J. Theor. Biol.

,

2010

, vol.

266

(pg.

11

-

20

)

https://doi.org/10.1016/j.jtbi.2010.06.017

Google Scholar

Crossref

PubMed

21

Gardner

T.S.

,

Cantor

C.R.

,

Collins

J.J.

Construction of a genetic toggle switch in Escherichia coli

,

Nature

,

2000

, vol.

403

(pg.

339

-

342

)

https://doi.org/10.1038/35002131

Google Scholar

Crossref

PubMed

22

Ajo-Franklin

C.M.

,

Drubin

D.A.

,

Eskin

J.A.

,

Gee

E.P.S.

,

Landgraf

D.

,

Phillips

I.

, et al

Rational design of memory in eukaryotic cells

,

Genes Dev.

,

2007

, vol.

21

(pg.

2271

-

2276

)

https://doi.org/10.1101/gad.1586107

Google Scholar

Crossref

PubMed

23

Lu

T.K.

,

Khalil

A.S.

,

Collins

J.J.

Next-generation synthetic gene networks

,

Nat. Biotechnol.

,

2009

, vol.

27

(pg.

1139

-

1150

)

https://doi.org/10.1038/nbt.1591

Google Scholar

Crossref

PubMed

24

Farzadfard

F.

,

Lu

T.K.

Genomically encoded analog memory with precise in vivo DNA writing in living cell populations

,

Science

,

2014

, vol.

346

pg.

1256272

https://doi.org/10.1126/science.1256272

Google Scholar

Crossref

PubMed

25

Yang

L.

,

Nielsen

A.A.

,

Fernandez-Rodriguez

J.

,

McClune

C.J.

,

Laub

M.T.

,

Lu

T.K.

, et al

Permanent genetic memory with >1-byte capacity

,

Nat. Methods

,

2014

, vol.

11

(pg.

1261

-

1266

)

https://doi.org/10.1038/nmeth.3147

Google Scholar

Crossref

PubMed

26

Nishimura

K.

,

Fukagawa

T.

,

Takisawa

H.

,

Kakimoto

T.

,

Kanemaki

M.

An auxin-based degron system for the rapid depletion of proteins in nonplant cells

,

Nat. Methods

,

2009

, vol.

6

(pg.

917

-

922

)

https://doi.org/10.1038/nmeth.1401

Google Scholar

Crossref

PubMed

27

Giuraniuc

C.V.

,

MacPherson

M.

,

Saka

Y.

Gateway vectors for efficient artificial gene assembly in vitro and expression in yeast Saccharomyces cerevisiae

,

PLoS One

,

2013

, vol.

8

pg.

e64419

https://doi.org/10.1371/journal.pone.0064419

Google Scholar

Crossref

PubMed

28

Rusk

N.

Orthogonal logic gates

,

Nat. Methods

,

2014

, vol.

11

pg.

132

https://doi.org/10.1038/nmeth.2830

Google Scholar

Crossref

PubMed

29

Phillips

P.C.

Epistasis—the essential role of gene interactions in the structure and evolution of genetic systems

,

Nat. Rev. Genet.

,

2008

, vol.

911

(pg.

855

-

867

)

https://doi.org/10.1038/nrg2452

Google Scholar

Crossref

30

Didovyk

A.

,

Kanakov

O.I.

,

Ivanchenko

M.V.

,

Hasty

J.

,

Huerta

R.

,

Tsimring

L.

Distributed classifier based on genetically engineered bacterial cell cultures

,

ACS Synth. Biol.

,

2015

, vol.

4

(pg.

72

-

82

)

https://doi.org/10.1021/sb500235p

Google Scholar

Crossref

PubMed

31

Kanakov

O.

,

Laptyeva

T.

,

Tsimring

L.

,

Ivanchenko

M.

Spatiotemporal dynamics of distributed synthetic genetic circuits

,

Phys. D

,

2016

, vol.

318–319

(pg.

116

-

123

)

https://doi.org/10.1016/j.physd.2015.10.016

Google Scholar

Crossref

32

Nielsen

A.A.

,

Der

B.S.

,

Shin

J.

,

Vaidyanathan

P.

,

Paralanov

V.

,

Strychalski

E.A.

, et al

Genetic circuit design automation

,

Science

,

2016

, vol.

352

pg.

aac7341

https://doi.org/10.1126/science.aac7341

Google Scholar

Crossref

PubMed

33

Kanakov

O.

,

Kotelnikov

R.

,

Alsaedi

A.

,

Tsimring

L.

,

Huerta

R.

,

Zaikin

A.

Multi-input distributed classifiers for synthetic genetic circuits

,

PLoS One

,

2015

, vol.

10

pg.

e0125144

https://doi.org/10.1371/journal.pone.0125144

Google Scholar

Crossref

PubMed

34

Terrell

J.T.

,

Wu

H.-C.

,

Tsao

C.-Y.

,

Barber

N.B.

,

Servinsky

M.D.

,

Payne

G.F.

, et al

Nano-guided cell networks as conveyors of molecular communication

,

Nat. Commun.

,

2015

, vol.

6

pg.

8500

https://doi.org/10.1038/ncomms9500

Google Scholar

Crossref

PubMed

35

Pais-Vieira

M.

,

Chiuffa

G.

,

Lebedev

M.

,

Yadav

A.

,

Nicolelis

M.A.

Building an organic computing device with multiple interconnected brains

,

Sci. Rep.

,

2015

, vol.

5

pg.

11869

https://doi.org/10.1038/srep11869

Google Scholar

Crossref

PubMed

36

Engler

C.

,

Gruetzner

R.

,

Kandzia

R.

,

Marillonnet

S.

Golden gate shuffling: a one-pot DNA shuffling method based on type IIs restriction enzymes

,

PLoS One

,

2009

, vol.

4

pg.

e5553

https://doi.org/10.1371/journal.pone.0005553

Google Scholar

Crossref

PubMed

37

Weber

E.

,

Engler

C.

,

Gruetzner

R.

,

Werner

S.

,

Marillonnet

S.

A modular cloning system for standardized assembly of multigene constructs

,

PLoS One

,

2011

, vol.

6

pg.

e16765

https://doi.org/10.1371/journal.pone.0016765

Google Scholar

Crossref

PubMed

38

Gibson

D.G.

,

Young

L.

,

Chuang

R.Y.

,

Venter

J.C.

,

Hutchison, 3rd

C.A.

,

Smith

H.O.

Enzymatic assembly of DNA molecules up to several hundred kilobases

,

Nat. Methods

,

2009

, vol.

6

(pg.

343

-

345

)

https://doi.org/10.1038/nmeth.1318

Google Scholar

Crossref

PubMed

39

Noskov

V.N.

,

Karas

B.J.

,

Young

L.

,

Chuang

R.Y.

,

Gibson

D.G.

,

Lin

Y.C.

, et al

Assembly of large, high G+C bacterial DNA fragments in yeast

,

ACS Synth. Biol.

,

2012

, vol.

1

(pg.

267

-

273

)

https://doi.org/10.1021/sb3000194

Google Scholar

Crossref

PubMed

40

de Kok

S.

,

Stanton

L.H.

,

Slaby

T.

,

Durot

M.

,

Holmes

V.F.

,

Patel

K.G.

, et al

Rapid and reliable DNA assembly via ligase cycling reaction

,

ACS Synth. Biol.

,

2014

, vol.

3

(pg.

97

-

106

)

https://doi.org/10.1021/sb4001992

Google Scholar

Crossref

PubMed

41

Annaluru

N.

,

Muller

H.

,

Mitchell

L.A.

,

Ramalingam

S.

,

Stracquadanio

G.

,

Richardson

S.M.

, et al

Total synthesis of a functional designer eukaryotic chromosome

,

Science

,

2014

, vol.

344

(pg.

55

-

58

)

https://doi.org/10.1126/science.1249252

Google Scholar

Crossref

PubMed

42

Hutchison

C.A.

,

Chuang

R.Y.

,

Noskov

V.N.

,

Assad-Garcia

N.

,

Deerinck

T.J.

,

Ellisman

M.H.

, et al

Design and synthesis of a minimal bacterial genome

,

Science

,

2016

, vol.

351

pg.

aad6253

Google Scholar

Crossref

PubMed

43

Russell

S.

,

Hauert

S.

,

Altman

R.

Veloso

M.

Robotics: ethics of artificial intelligence

,

Nature

,

2015

, vol.

521

(pg.

415

-

418

)

https://doi.org/10.1038/521415a

Google Scholar

Crossref

PubMed

2016

This is an open access article published by Portland Press Limited on behalf of the Biochemical Society and distributed under the Creative Commons Attribution Licence 4.0 (CC BY).

Synthetic biology routes to bio-artificial intelligence

Introduction

Supervised learning in artificial intelligence: students, teachers and classification

A synthetic gene network for linear classification

Supervised learning in synthetic biology: student cells and teacher cells

Linear classification with a biological student–teacher network

Mathematical modelling of a biological student–teacher network

Associative learning

Building an associative perceptron with synthetic gene networks

Genetic memory circuits

A synthetic gene network for associative learning

Classification of complex inputs

SGNs to classify input data that are not linearly separable

Ensembles of SGNs for classification of complex inputs

Simulation of an ensemble SGN soft learning how to classify overlapping input signals

Intercellular communication between synthetic gene networks

Applications and implications of bio-artificial intelligence

Summary

Abbreviations

Funding

Competing Interests

References

Cited By

Get Email Alerts

CONNECT

EXPLORE

Cover Image

Synthetic biology routes to bio-artificial intelligence

Introduction

Supervised learning in artificial intelligence: students, teachers and classification

A synthetic gene network for linear classification

Supervised learning in synthetic biology: student cells and teacher cells

Linear classification with a biological student–teacher network

Mathematical modelling of a biological student–teacher network

Associative learning

Building an associative perceptron with synthetic gene networks

Genetic memory circuits

A synthetic gene network for associative learning

Classification of complex inputs

SGNs to classify input data that are not linearly separable

Ensembles of SGNs for classification of complex inputs

Simulation of an ensemble SGN soft learning how to classify overlapping input signals

Intercellular communication between synthetic gene networks

Applications and implications of bio-artificial intelligence

Summary

Abbreviations

Funding

Competing Interests

References

Cited By

Get Email Alerts

CONNECT

EXPLORE

This Feature Is Available To Subscribers Only