Why proteins are fuzzy? Constant adaptation to the cellular environment requires a wide range of changes in protein structure and interactions. Conformational ensembles of disordered proteins in particular exhibit large shifts to activate or inhibit alternative pathways. Fuzziness is critical for liquid–liquid phase separation and conversion of biomolecular condensates into fibrils. Interpretation of these phenomena presents a challenge for the classical structure-function paradigm. Here I discuss a multi-valued formalism, based on fuzzy logic, which can be applied to describe complex cellular behavior of proteins.
Introduction
Traditionally, we consider a statement either true or false (0 or 1, Boolean logic). In our everyday life, however, we use more complex descriptions. Somebody is not simply tall or short, but tall to a certain degree (e.g. relatively tall, tall for the age, tallest in the team) usually defined on a given scale. These statements conform to a many-valued or fuzzy logic [1], as we assign a degree to the truth value. This means that more values can be true, but to different degrees (e.g. somebody can be tall and short, depending on the person of comparison). Similarly, the organization of a protein chain is along the continuum between an ordered state dominated by one conformation, and a disordered state generated by multiple conformations [2]. Different protein regions may have different degrees of preference for ordered or disordered states [3].
Most proteins are capable to perform multiple activities under physiological conditions [4–9]. In the stochastic cellular environment, the different protein functions are realized to different extents depending on the intra- or extracellular signals, localization or milieu. The classical one gene → one structure → one function paradigm, which is based on binary logic, is challenged to interpret this phenomenon. The simultaneous occurrence of multiple activities, however, can be described based on fuzzy logic. In artificially intelligent devices a fuzzy logic-based control system is used to adapt to non-precisely defined or unforeseen conditions, such as the cellular world for proteins. Here I propose a structure-function framework based on multi-valued, fuzzy logic and illustrate this approach using interactions of disordered proteins [10].
A binding mode continuum between order and disorder
The wide range of changes in structure and dynamics upon interactions of disordered proteins result in a continuum of bound states between order and disorder [11,12]. Based on the change in conformational entropy upon binding we can classify different ‘binding modes’, which represent different degrees of conformational heterogeneity of the bound state ensemble. Ordered binding modes are achieved via disorder-to-order transitions resulting in well-defined contact patterns between specific residues [13,14]. Disordered binding modes involve many binding configurations, which are resulted by disorder-to-disorder transitions [15–17]. In these cases, interactions in the bound state ensemble are mediated by redundant motifs.
Ordered and disordered binding modes have different sequence codes [18]. Binding motifs mediating ordered interactions exhibit a distinct composition as compared with their flanking regions. This is exemplified by short hydrophobic motifs within disordered regions [19]. In contrast, the interacting elements mediating disordered binding modes have a similar composition to their flanking regions (e.g. tandem motifs or low-complexity sequences) [20,21]. Based on these local compositional biases, the probabilities of transitions towards ordered or disordered bound states (pDO and pDD) can be predicted from the sequence. The method has been published [18], therefore only the key points and equations are summarized below. Predictions were validated using ∼2000 protein complexes, by comparing the unbound and bound states of the same protein regions [18]. Using this approach, ∼40% of disordered proteins in yeast were predicted to fold upon partner interactions, in good agreement with proteomic results in vivo [22].
Recognizing that the bound state of a protein can be to some extent ordered and to some extent disordered and simultaneous quantification of these two properties conforms to fuzzy logic.
Fuzzy binding of disordered proteins
Fuzzy binding means that the interaction behavior of a protein region changes according to the cellular context. In particular, binding to different partners or changes in cellular conditions (localization, posttranslational modification) may result in different bound state ensembles with different degrees of conformational heterogeneity, even if the same region is involved in the interaction. For example, binding to E-cadherin induces the ordering of the 134–161 residues in β-catenin (PDB:1i7w; [23]), while this region remains disordered in the complex with ICAT β-catenin interacting protein (PDB: 1m1e [24]). Ordered binding modes can be induced by posttranslational modifications, exemplified by the phosphorylation of the N-terminal autoinhibitory region of glycogen-synthase kinase 3 in the insulin pathway (GSK-3, PDB: 4nm3 [25]), which region remains disordered when bound to the Wnt receptor LRP6 motif in the Wnt pathway (PDB: 4nm5 [25]). Phosphorylation in the bound complex may also induce a conversion between disordered and ordered binding modes, which may give rise to oligomerisation or higher-order assembly [26,27].
Increasing experimental evidence shows that context-dependent variation of binding modes can activate different cellular pathways. Thus, fuzzy binding introduces uncertainty into predicting function, as the same sequence encodes different bound state ensembles, with different biological outcomes. Indeed, fuzzy binding is exploited to regulate signaling specificity [7,28–30].
Fuzzy binding is represented by binding mode landscapes
Context-dependent interaction behavior of proteins requires novel computational approaches to predict fuzzy binding from sequence.
The degree of ‘fuzziness’ can be defined by binding mode entropy (Sbind) [31], which quantifies the change in bound-state conformational heterogeneity with different partners and conditions. This is derived from the distributions of binding entropies with different interaction partners and plausible binding sites, as illustrated in Figure 1A for four interaction sites of the p53 tumor suppressor.
Representation of the binding mode landscape concept.
(A) The interaction behavior of four different regions of the p53 tumor suppressor: top left is the mdm2 binding region (residues 19–26, PDB:1ycr, blue); top right is the DNA recognition helix (residues 278–285, PDB: 2ady, lime); bottom left is the oligomerisation domain (residues 325–345, PDB: 1c26, dark blue); bottom right is the C-terminal peptide bound to sirtuin (residues 378–386, PDB: 4zzj, magenta). The partners are shown by grey surfaces. The interaction sites, which bind to mdm2 and DNA (top panels) have broad distributions, indicating a wide range of conformational heterogeneity in the bound state. In contrast, the p53 C-terminal peptide exhibits a narrow distribution, indicating that it mostly remains disordered (bottom right), when bound to different partners. The oligomerisation domain (bottom left) shows high frequency ordered binding modes, but can also visit more disordered bound configurations. (B) Representation of the four p53 regions on a binding mode landscape. The C-terminal peptide of p53 (magenta), which is predicted to remain conformationally heterogeneous in its complexes (pDD = 0.9), has the lowest variation in binding modes (Sbind = 0.7). In contrast, the mdm2 binding helix (lime) may exhibit both ordered and disordered binding configurations (pDD = 0.4), indicating a wide range of conformational heterogeneity in the bound state ensemble with different partners (Sbind = 2.7). The DNA recognition helix (lime) may also exhibit high binding mode entropy indicating context-dependence (Sbind = 2.5). The oligomerisation domain (blue) has a considerably larger probability for ordering upon binding (pDD = 0.3, Sbind = 2.0). The blue area on the bottom left of the figure represents disorder-to-order binding of disordered regions by conformational selection and induced fit mechanisms. These are weakly context-dependent binding modes and have low Sbind values. The grey diamond (pDD = 0.5, Sbind = 0) represents a hypothetical ‘lock-and-key’ mechanism for disordered regions, when the ensemble does not change upon binding (no change in conformational entropy) and is not influenced by the context. This scenario however, is not realized in Nature.
(A) The interaction behavior of four different regions of the p53 tumor suppressor: top left is the mdm2 binding region (residues 19–26, PDB:1ycr, blue); top right is the DNA recognition helix (residues 278–285, PDB: 2ady, lime); bottom left is the oligomerisation domain (residues 325–345, PDB: 1c26, dark blue); bottom right is the C-terminal peptide bound to sirtuin (residues 378–386, PDB: 4zzj, magenta). The partners are shown by grey surfaces. The interaction sites, which bind to mdm2 and DNA (top panels) have broad distributions, indicating a wide range of conformational heterogeneity in the bound state. In contrast, the p53 C-terminal peptide exhibits a narrow distribution, indicating that it mostly remains disordered (bottom right), when bound to different partners. The oligomerisation domain (bottom left) shows high frequency ordered binding modes, but can also visit more disordered bound configurations. (B) Representation of the four p53 regions on a binding mode landscape. The C-terminal peptide of p53 (magenta), which is predicted to remain conformationally heterogeneous in its complexes (pDD = 0.9), has the lowest variation in binding modes (Sbind = 0.7). In contrast, the mdm2 binding helix (lime) may exhibit both ordered and disordered binding configurations (pDD = 0.4), indicating a wide range of conformational heterogeneity in the bound state ensemble with different partners (Sbind = 2.7). The DNA recognition helix (lime) may also exhibit high binding mode entropy indicating context-dependence (Sbind = 2.5). The oligomerisation domain (blue) has a considerably larger probability for ordering upon binding (pDD = 0.3, Sbind = 2.0). The blue area on the bottom left of the figure represents disorder-to-order binding of disordered regions by conformational selection and induced fit mechanisms. These are weakly context-dependent binding modes and have low Sbind values. The grey diamond (pDD = 0.5, Sbind = 0) represents a hypothetical ‘lock-and-key’ mechanism for disordered regions, when the ensemble does not change upon binding (no change in conformational entropy) and is not influenced by the context. This scenario however, is not realized in Nature.
Fuzzy binding can be described by binding mode landscapes, which are derived from the distributions of binding entropy associated with distinct binding events [31]. The x-axis quantifies the conformational heterogeneity in the bound state, reflecting the most likely mode of binding. The y-axis quantifies the binding mode entropy (Sbind), reflecting changes in conformational heterogeneity with different partners. The methods to derive these quantities have been published [31], here only the key points are reviewed.
The binding mode landscape concept is illustrated in Figure 1B for the p53 interaction sites. The C-terminal region of p53 (residues 378–386) for example, has a low Sbind value, as it tends to bind via short motifs to sirtuins or Cdk2/cyclin-A with a large degree of conformational heterogeneity [32,33]. In contrast, the N-terminal transactivation region of p53, which serves as a network hub, has a high Sbind value, reflecting selective interactions with Mdm2 (residues 19–26) [34], HMGB [35] or CBP/p300 [36] and many other partners via both ordered and disordered binding modes. The recognition helix (residues 278–285) has a similarly high Sbind value, reflecting conditional folding upon DNA interactions. The tetramerisation domain (residues 325–345) of p53 is biased towards ordered binding modes [37] and in accord has a lower Sbind value (Figure 1B).
Taken together, the binding mode entropy is derived from different, hypothetical binding events and is computed over bound-states ensembles with different partners. In contrast, the binding entropy is related to the conformational heterogeneity of the bound-state ensemble in a single binding event. Representing the binding entropy and binding mode entropy on the binding mode landscape characterizes fuzzy binding.
Fuzzy binding in biomolecular condensates: from liquids to fibrils
Biomolecular condensates formed by liquid–liquid phase separation open new frontiers in our understanding of cellular processes [38,39]. The molecular driving forces, which organize these assemblies however, have not been fully elucidated, owing to the complex stoichiometry and the wide variety of sequence patterns in these membraneless organelles [40,41]. In general, weak interactions amongst redundant binding motifs is considered as the organizing principle of these higher-order assemblies [42].
Within the framework of fuzzy binding, biomolecular condensates exhibit disordered binding modes with many different bound configurations mediated by low-complexity motifs [43]. Such conformational heterogeneity (generated by disorder-to-disorder transitions) is required for driving droplet formation. Thus, protein regions capable of forming disordered binding modes can spontaneously phase separate and serve as scaffolds for droplets [44]. Other components of condensates, termed as clients, require a partner-induced change in their binding modes. These proteins exhibit fuzzy binding and shift from ordered to disordered bound states in a context-dependent manner [27].
Aging or familial mutations can induce the conversion of liquid droplets to fibrils, which are implicated in neurodegenerative diseases [45,46]. Phase separation may facilitate aggregation [47,48], but their relationship has not been fully elucidated. Fibrillization of liquid droplets takes place via a gradual change in binding modes from disordered to ordered states [45,46]. Some mutations, such as glycine mutations in Fus reduce fluidity and thus shift the system towards more ordered binding modes [49]. Other mutations affect interactions with other biomolecules, in particular with RNA, which induce changes from disordered to ordered bound states. According to fuzzy logic, disordered and ordered binding modes co-exist, and their gradual shift drives the system towards the amyloid state.
Why fuzzy?
The concept of fuzziness reflects a lack of precision, which we aim to avoid in scientific studies. On the other hand, the fuzzy set theory provides a quantitative approach to handle uncertainty. Cellular behavior of proteins is stochastic, with a high level of noise and is driven by many, often conflicting signals. Proteins have evolved to perform multiple activities [4–6,50] for adaptating to the cellular context of a network of proteins [51–53]. It is beyond the binary logic of the classical structure-function paradigm to describe how proteins can simultaneously carry out multiple functions.
Let's consider the problem of biomolecular recognition by dynamic conformational ensembles [54]. The classical lock-and-key mechanism, when the binding does not influence the conformational ensemble would not operate here. Instead, the bound state of dynamic conformational ensembles can be achieved via conformational selection or induced fit mechanisms depending on the protein concentration [55]. Both scenarios are coupled to disorder-to-order transitions and result in a well-defined bound state, which weakly depends on the context (low pDD and Sbind, Figure 1B). Such a well-defined bound-state can be described by a deterministic one structure → one function model, which conforms to binary logic. In contrast, the fuzzy framework can describe dynamic conformational ensembles with multiple activities by establishing one structure → multiple functions relationships [56]. In addition, the fuzzy model can explain considerable conformational shifts in the ensemble with a minor impact on the original function, leading to multiple structures → one function relationships [57,58]. These functional redundancies are in accord with the messiness of evolutionary innovations [59].
Outlook: describing complex cellular behavior
The cellular world is very complex, where proteins need to perceive their environments to maximize their chance of successfully achieving their biological roles. To solve similar adaptation problems, artificially intelligent devices often use a fuzzy control system, which exploits multiple, simultaneous activities.
Structural studies demonstrate the conformational heterogeneity in cellular systems. Description of these ensembles [60,61] under a variety of conditions is a key to establish structure-function relationships using a multi-valued (fuzzy) formalism. Another essential component is to characterize the activities performed by the different ensembles [62] and to what extent these activities are modulated by shifting the populations of the different conformational sub-states. As in most cases this information is not available, this can be a bottleneck for developing many-valued structure-function models.
The interaction partners of disordered proteins, for example, are often not known, in particular under cellular conditions. However, as also illustrated in this paper, context-dependent interaction behavior can be predicted from the sequence. Binding mode landscapes describe fuzzy binding using the (i) the degree of conformational heterogeneity of the bound state ensembles (‘binding mode’), and (ii) the binding mode entropy. Entropy is usually a measure of uncertainty. Binding mode entropy is a measure of binding mode uncertainty. We need to accept that the protein sequence also encodes multiple interaction behaviors. Recognizing such uncertainty will contribute to understanding cellular activities of proteins. In particular, how the same sequence contributes to different signaling pathways, even with opposite biological outcomes.
Application of the fuzzy structure-function model thus opens perspectives to identify context-dependent regulatory motifs in protein networks. This approach can also be used to describe the regulated assembly/disassembly of membraneless organelles [44,63]. Fuzzy structure-function relationships can be integrated into a fuzzy inference system [56], which can model cellular networks with different biological outcomes, similarly to the control system of artificially intelligent devices. Overall, we do not need to abandon the classical structure-function paradigm to understand cellular behavior, just expand it to a multi-valued formalism.
Perspectives
Predicting complex interaction behavior of disordered proteins
Modeling cellular networks including context-dependence of individual proteins
Deriving sequence codes for biomolecular condensates based on fuzzy binding
Modeling the conversion of droplets to amyloids, and identifying pathological mutations
Predicting cellular consequences of changing interaction modes
Development a general structure-function paradigm for proteins
Competing Interests
The author declares that there are no competing interests associated with this manuscript.
Acknowledgements
M.F. thanks to HAS-11015, GINOP-2.3.2-15-2016-00044 and INFN Sezione di Padova for the financial support.