Abstract
In the past 15 years, cell-based microscopy has evolved its focus from observing cell function to aiming to predict it. In particular—powered by breakthroughs in computer vision, large-scale image analysis and machine learning—high-throughput and high-content microscopy imaging have enabled to uniquely harness single-cell information to systematically discover and annotate genes and regulatory pathways, uncover systems-level interactions and causal links between cellular processes, and begin to clarify and predict causal cellular behaviour and decision making. Here we review these developments, discuss emerging trends in the field, and describe how single-cell ‘omics and single-cell microscopy are imminently in an intersecting trajectory. The marriage of these two fields will make possible an unprecedented understanding of cell and tissue behaviour and function.
The unique role of microscopy in illuminating single cell biology
Single cell biology will fundamentally revolutionize biology and healthcare. Building on the arsenal of ‘omics technologies developed in the past 20 years—genomics, transcriptomics, metabolomics, epigenomics, proteomics, etc.—and applying them to comprehensively molecularly profile individual cells in unparalleled detail, this young field aims to: systematically elucidate what brings about cell-to-cell heterogeneity within cell populations and tissues; clarify the link between genotype and phenotype; characterize the broad diversity of existing cellular types, states, transitions and histories observed in the living; provide quantitative/predictive insights on how the pheno-genotypic landscape of cells changes upon physiological or pathological triggers; and ultimately leverage that knowledge to reveal new biology as well as help develop improved cell-based diagnostics and tools for bio/healthcare applications [1–4]. For instance the Human Cell Atlas consortium aims to exploit single-cell technologies to create the most comprehensive reference maps to date of all human cells [5]. Similarly, the use of single-cell ‘omics technologies to track embryonic development cell by cell was highlighted as part of Science Magazine’s 2018 Breakthrough of the Year [6]. Yet for all that single-cell ‘omics can give a deep molecular snapshot of cells and even their lineage relationships and history (for instance using molecular barcoding approaches [7–9]), those approaches fail to fully capture the true organization and spatiotemporal dynamics of cells and tissues as they grow, divide, die, migrate or change their structure/function through time, and hence can infer only indirectly how cells make decisions in space and in time in a cause–effect manner. This is where microscopy, particularly ‘live cell’ fluorescence light microscopy, is unchallenged and unparalleled [10–14]. This essay describes how high-throughput and high-content microscopy imaging make it possible to exploit spatial and dynamical information at the single-cell level to discover genes and pathways, observe and predict cell structure and function, and how it could shed unique light into causal cell function and evolution in ways complementary and inaccessible to other single-cell approaches.
From high-throughput screening to high-throughput and high-content cell biology
The combined use of large-scale (‘high-throughput’), cell-based fluorescence light microscopy with computational image analysis and feature extraction (‘high-content’) was originally published as a way to quantitatively profile at scale the effect of drugs on cells, in the context of pharmacological ‘fixed cell’ end-point assays for drug screening/development where it has been used since extensively [15–18]. However, it quickly became obvious that this could provide a new way to perform cell biological [19] experiments at scale in an unbiased, systematic and quantitative manner (Figure 1). One of the first published studies to show this was a budding yeast (S. cerevisiae) study using 4718 knock-out (KO) gene deletion mutant cell lines to look for genes whose mutation could yield defective cell wall, actin and nuclear DNA morphological phenotypes. That fixed-cell, endpoint assay study identified 2378 gene candidates potentially involved in controlling different aspects of cell morphology and found that similar phenotypes were caused by deletions of functionally related genes [20], providing the first illustration of the potential of this approach to do comprehensive—potentially genome-wide—reverse genetics investigations. In the 10 years that followed, and gradually empowered by the GFP revolution as well as increasingly refined techniques for systematic gene KO or knock-down (KD; RNAi/siRNA and now CRISPR), a plethora of similar studies followed using cell types ranging from yeasts to human cell lines and aiming to carry out genome-wide gene discovery and functional assignment [19,21–29]. For instance, the landmark project MitoCheck used RNA interference to KD ∼21 000 human protein coding genes in live HeLa cells and systematically look for candidate human cell division genes through the genome [30]. By quantitatively and automatically analysing more than 180 000 timelapse epifluorescence microscopy movies of cells stably expressing histone H2B-GFP (>17 000 000 images), MitoCheck identified and validated 572 genes involved in cell division—over half of them never before linked to mitosis regulation – providing the largest genomic catalogue of mitotic genes until then. High-throughput microscopy phenotyping has been used with success to obtain genomic catalogues of—and discover many new genes involved in – countless basic cellular processes including cell division [30], cell migration [31], endocytosis [32,33], Golgi organization [34,35], mitochondrial quality control [36] and structure [37], the microtubule cytoskeleton [38,39] and viral infection [40]. Notably, despite all those seminal works and others, functional discovery and annotation across genomes is still very much a task in progress [41]. In the case of high-throughput microscopy screening, the reasons for this are numerous and include the fact that the translation of quantitative measurements into well-defined annotations in agreed ontologies is particularly challenging [42]. In addition microscopy-based screens have traditionally suffered from poor reproducibility [43] (i.e. a low overlap between gene lists obtained from independent screens looking at a same process) and it has been shown that the identification by microscopy-based screening of candidate genes associated with a given cell function is probabilistic and does not allow in any one screen to identify all possible genes associated with that function [39], a coverage problem that may only be gradually solved through incremental integration of complementary screens in the future.
The principle of high-throughput/high-content microscopy screens to identify and study genes’ functions at scale
Using computational image analysis and machine learning for deep single-cell phenotyping
Of note, the introduction of high-content microscopy phenotyping was key to enable richer and deeper phenotypic characterization at the single-cell level, thanks to advances in numerical feature extraction, comparison and analysis [11,44–46]. Computed single-cell-based morphological, intensity and texture features can be simple and biologically interpretable, e.g. the number, size, intensity and positions of endocytic vesicles [33] or the length, number and curvature of microtubules [39], or they can be more complex mathematical constructions, like the CHARM feature set [47]. In addition, cell population-level features like cellular position and context [48,49] can be obtained. The data science pipelines needed to extract information from such descriptors is then more involved [4,50] and include comparison of conditions [51,52] and use of machine learning methods [53] (e.g. unsupervised clustering or supervised classification) to both explore cell-to-cell variability and to group cells or conditions according to phenotypic similarity for functional annotation or mechanistic characterization [54]. The publication of open source libraries and software packages performing those tasks and automating them have been fundamental to advances in the field; they are described in more detail elsewhere [45,50].
An important recent development is Deep Learning, a class of machine learning/artificial intelligence (ML/AI) algorithms that can be easier to use and far more efficient than their predecessors, and particularly adapted to images through Convolutional Neural Networks (CNNs) [55]. They have been applied to high-throughput/high-content microscopy and are sure to become a key part of analysis pipelines from now on. They can be used as a feature learning step included in a classical pipeline [56] (for example to better segment/detect cells or subcellular structures that are then analysed with more classical methods), or more thoroughly as an end-to-end trainable phenotyping strategy [57], and increasingly for fluorescent label prediction from label-free samples [58–60]. While the power and usefulness of Deep Learning approaches is clear, their downside is that they need a large amount of annotated data and computing power, and are more often of use in a supervised rather than unsupervised setting. They are also ‘black-boxes’, notoriously hard to interpret or understand with sometimes unexpected failure modes [61,62]. To mitigate those issues, on top of a good understanding of those methods special care has to be taken in evaluating the algorithms used and their results. This includes establishing standard internal controls within the algorithmic pipelines—essentially using good practices for machine learning—and, at the level of whole studies, checking for consistency of biological end results and experimentally testing predictions.
Reconstructing regulatory networks and systems-level cellular wiring
We have so far seen how high-throughput/high-content microscopy can be used to extract rich information at the single-cell level to identify and characterise the function of genes, by using gene KD/KO strategies for example. However genes do not operate independently but as regulatory networks (interaction networks, transcription networks, metabolic or signalling networks) and uncovering those networks is a key goal of functional genomics. One common simplifying assumption to try and estimate network-level information from single condition experiments is to link conditions leading to a similar phenotype [63]. However such phenotypic similarity networks, while shown to be enriched in physical or genetic interactions and to work with some success [64], are typically noisy and hard to interpret.
Revealing regulatory networks by double gene KD/KO
To go further and investigate the existence of a network link between two genes one has to change the experiment so that each condition probed is the combination of two interventions (Figure 2). For instance, Synthetic Genetic Array (SGA) approaches have achieved that by combinatorially crossing large collections of budding yeast KO cell lines—each knocked out for one gene in the genome—among themselves, and imaging and analysing images of the resulting colonies’ size and appearance in the plates to look for double KOs whose combinations have positive or negative (synthetic) epistatic effect on cell fitness [65,66]. The resulting ∼1 million gene–gene interactions (edges) scored in this way can then be used to generate a global genetic interaction network map or ‘wiring diagram’ of a typical S. cerevisiae cell. Similar approaches have been applied to mammalian cells using finer epifluorescence microscopy-based readouts [67–69]. Building on those methods, the combination of high-content microscopy phenotyping with double KO/KD interventions allows looking for more complex epistatic relationships between genes, either by considering each single phenotype independently under a multiplicative assumption [70,71] or by combining them to infer directed interaction networks [67]. Monitoring how the synthetic double KO/KD phenotypes change over time allows mapping how regulatory networks rewire, giving a much more complex picture of regulatory network dynamics [72].
Reconstructing gene/protein networks and systems-level interactions between cellular processes
Revealing regulatory networks by combining gene KD/KO and protein localization
Another way to combinatorially probe and reveal edges in regulatory networks is by combining the use of gene KD/KO strategies with fluorescently tagged protein (re)localization, to build a so-called Localisation Interdependency Network (LIN) [73]. According to this approach, if the protein produced by gene B becomes de-localized in cells as a function of KD/KO of gene A, then the localization (and function) of B depends on A, thereby directly revealing a directed edge going from gene/protein A to gene/protein B. When done combinatorially across many genes by high-throughput epifluorescence microscopy imaging this procedure allows the generation of a signed, directed and weighted network connecting those genes without need for directionality inference – thereby overcoming an intrinsic limitation of double gene KD/KO approaches. Technical challenges with the LIN approach include the fact that fluorescently tagging proteins using genetically encoded fluorescent tags (like GFP) often compromises their function, hence careful quality control and validation is required, as well as challenges with quantifying intracellular protein localisation changes and phenotypes. This technique was used with success to investigate interactions between the core ∼40 cell polarity regulators of fission yeast (S. pombe), revealing the most complete picture of the cell polarity network for that cell type to date and discovering 554 pairwise interactions among the polarity regulators (98% of them novel) as well as ‘modular’ interactions between subgroups of regulators in the network. A similar much larger scale approach was used in S. cerevisiae combining SGA and high-throughput/high-content microscopy phenotyping to identify how the entire budding yeast proteome changes over time in response to drugs like rapamycin and hydroxyurea [74]. These approaches, as well as emerging perturbation-free approaches exploiting inherent cellular fluctuations in fluorescently labelled proteins [75,76], are enabling to map information flow in regulatory networks at unprecedented spatial and temporal resolution.
Inferring systems-level interactions and causal links between cellular processes
Another means of deriving biologically meaningful networks from multivariate single-cell data is using Bayesian network inference through a Bayesian graphical model of the probability distribution of the measurements. By computing conditional independencies Bayesian network inference allows the investigation of possible causality relationships between variables. This approach was proposed early on for use in flow cytometry [77], where single cell fluorescence measurements of phosphoproteins can be linked to activity and a signalling network can be inferred. In high-content screening, it was introduced to look at causality relationships between cellular/subcellular features, to allow building a high level system-wide description of the processes under study. Using Bayesian network inference the projects HepatoSys and Endotrack were able for example to identify and predict key differences in the design principles of the endocytosis of Transferrin versus that of Epidermal Growth Factor in human cell lines [33]. Similarly in the ‘multi-process’ phenomics project SYSGRO, which monitored how fission yeast cell shape, microtubule organization and cell cycle progression co-vary simultaneously across a genome-wide collection of mutant cell lines, Bayesian network inference was used to predict directional systems-level functional links between cell shape and microtubule control that could be successfully experimentally validated [39]. It is important to point out that although potentially very powerful such network inference methods are not infallible and the computational predictions derived from them (the topology and directionality of the network) must be experimentaly validated, a step unfortunately too often missing in such studies. In the future methods taking full advantage of high-dimensional, multi-process, multi-parametric single-cell information measured jointly in a cell/cell population [78,79] promise to increasingly provide a goldmine of discovery into how cells work as integrated systems.
Pushing the limits of single-cell high-content imaging: multi-scale, dynamical, functional
High-throughput/high-content microscopy is naturally evolving, as is microscopy as a whole, away from purely cell-level assays and questions towards the two nearest scales, tissues and organs above and single molecules below. In both cases technical obstacles abound but recent works are promising. At the larger scale, beyond the more classical methods extending the study of organoids at higher throughput [80], methods based on microfluidics for the generation of microencapsulated organoids on matrigel beads have been proposed [81]. At the smaller scale, an automated workflow for single-molecule localization microscopy on 96-well plates has been proposed [82], with the main issues being optimising acquisition, processing, analysis and storage to manage the data deluge that those methods generate. Equivalent techniques have also been used to study the nanoscale organization of bacterial cell division [83].
Another important milestone has been achieved by moving high-throughput/high-content microscopy into the realm of measuring 3D protein dynamics in live cells, at scale. Building on its predecessor MitoCheck, the follow on MitoSys project recently used Fluorescence Correlation Spectroscopy (FCS) to build the most resolved ‘mitotic cell atlas’ of the dynamics and potential interaction in 4D (3D + time) of proteins controlling cell division [84]. Focusing on 28 mitotic proteins tagged with eGFP and using supervised machine learning, the authors were furthermore able to assign protein quantities to six organelles (chromosomes, nuclear envelope, kinetochores, spindle, centrosomes and midbody) and determine the timing, stoichiometry and dissociation rates in those organelles for multiple mitotic proteins, demonstrating for example that AURKB kinase and its regulator CDCA8/borealin only partially colocalize at the midbody and that AURKB exhibits most likely an additional localization at the contractile cytokinetic ring consistent with a known cytokinetic function for that kinase. This provides a template for the elucidation of dynamical protein atlases in cells.
Finally, an area where the limits of high-content imaging are also being pushed is the integration of single-cell ‘omics with imaging, enabled by techniques such as mass spectrometry imaging (MSI) combined with microscopy [85–87] and in situ single-cell RNA profiling [88–92]. In the future these approaches are bound to give new and substantial functional depth to the microscopic study of single cell biology.
Sharing and integrating large microscopy phenomics datasets to boost discovery
One characteristic of high-content studies – particularly single-cell based – is the size and complexity of the data and metadata generated, from raw images to segmented single cells to the numerical features computed from those cells and the annotation eventually associated to them. Trying to organise that complexity has been the focus of much work in the community, from the identification of specific/suitable file formats [93,94], to the generation of databases [95,96] across modalities [97] and of interactive data visualisation tools [98]. Not surprisingly, given the size and very high cost of large high-throughput/high-content microscopy projects, initiatives to promote data sharing for re-use and cross analysis are becoming more common [99]. An example worth noting is the Image Data Resource (IDR, https://idr.openmicroscopy.org), the largest community-driven microscopy phenomics dataset resource in the world, which aims to bring together, organise and share with the community reference large-scale microscopy phenomics datasets (original image data, image-derived feature data, annotations and metadata) generated the world over using multiple imaging modalities in common ontologies and databases, so that the community is able to continue the discovery process by mining and integrating the datasets among themselves and with other existing pheno-genomic resources (STRING, GO, etc.) and to promote integration across organisms, biological processes and scales [100]. Although many of the >40 studies currently contained in IDR did not include single-cell data when originally published (particularly the older high-throughput microscopy datasets), an increasing number of the novel studies does. Likewise, many IDR image data sets can be reanalysed at the single-cell level even though the original publications describing them did not. In the future, by enabling the re-analysis and integration of an increasing number of datasets this unique resource promises to provide a precious trove of microscopy phenomics data for single-cell exploration and to help catalyze future biological discovery.
Exploiting single-cell data to predict cell structure-dynamics with artificial intelligence
An important avenue of research brought back to the forefront by recent work is generative models. Generative models aim to generate (i.e. simulate) new typical examples of the data under study to investigate statistical variability, allow a more accurate and intuitive way to describe a particular localisation or enable better and more realistic cellular models [101]. For instance in an earlier study the authors were able to learn conditional generative models of punctuate patterns knowing microtubule localisation, enabling the study of relative positions of organelles with single-cell resolution [102]. Strikingly, very recent work shows that Deep Neural Networks, on top of their use in discrimination and classification seen in earlier Deep Learning applications, can also be fantastic tools as generative models [103] (Figure 3). For example, so-called Generative Adversarial Networks (GANs) have been proposed by several groups as a means to do ‘impossible experiments’, i.e. to complete the data that is available with data that cannot be obtained but can be ‘guessed’ (i.e. predicted) [104,105]. Other uses of GANs include generating data to help in the training of data hungry Deep Neural Networks [106,107] or uses in correlative microscopy by learning an imaging modality from another (for example to generate super-resolution microscopy images from wide-field microscopy images [108] or predict labels from label-free images). We anticipate that in the future GANs and related methods will provide an invaluable tool to allow the enrichment of available datasets with computationally generated but biologically meaningful additions (including e.g. by data augmentation [109,110]) and will enable previously inaccessible studies. Importantly, going forward and as with Deep Learning in general, thorough, unbiased and study-specific evaluation and controls will be crucial to ensure trust and reproducibility. This might become particularly important in the future to prevent and avoid abuse of such powerful methods to generate synthetic ‘fake data’ [111], a recognized threat in the realms of social media, public discourse and politics that has not been contemplated until now in the scientific arena and might gravely corrupt scientific publications and databases if neglected.
Predicting spatio-temporal cellular structure using Deep Learning
Towards predicting causal cell behaviour and decision making
Lastly, an area where single-cell microscopy phenomics can give unique insights into cell function is clarifying cellular decision-making and how the behaviour of a heterogeneous population of cells evolves through time, such that some cells take one fate and other cells another. While single-cell ‘omics information can enable the inference of some aspects of cell lineage relationships and history, the information is ‘time implicit’ (i.e. the true timing and duration of events monitored are not known) and space agnostic. How cells grow, divide, die or migrate [112,113] or change their structure/function/location/fate through time within a population or tissue, the cell-to-cell temporal noise in those single-cell decisions, the true timing and time scale of those decisions on a cell-by-cell basis, and how all of this impacts on how each cell and the heterogeneous cell population and its progeny as a whole evolve [114] are dynamical informations simply inaccessible to single-cell ‘omics approaches [24,115]. The time and location of cell death events in a tissue clearly exemplify this blind spot of ‘omics single-cell approaches: those informations though crucial for cell and tissue patterning and homeostasis [116,117] just cannot be derived from inferred cell lineages; only real time measurement of those events could inform about them. Time-resolved multi-process microscopy phenotyping and cellular lineaging hence plays a unique and powerful role in making available such dynamical informations [118,119], and as such (although currently not yet very common) will continue to play an indispensable role in allowing to accurately measure and – powered by predictive analytics approaches from e.g. ML/AI – to precisely predict causally the behaviour, function and fate [120] of single-cells, cell populations and tissues (Figure 4). For example continuous multi-generational, time-lapse multi-colour single-cell tracking and lineaging by epifluorescence microscopy of mouse Embryonic Stem Cells (mESCs) expressing the fluorescently labelled pluripotency transcription factor NanogVENUS combined with exact Bayesian inference revealed that in mESCs Nanog autoregulates by weak negative feedback [121], and Deep Neural Networks were able—using brightfield-derived information—to predict mouse hematopoietic stem and progenitor cell (HSPC) lineage choice during differentiation up to three generations before it could be detected by immuno-fluorescence using conventional markers [122]. These examples illustrate the enormous future potential of combining large amounts of biological information (‘biological Big Data’) derived from continuous time-resolved microscopy with predictive analytics not just for the discovery of basic biological mechanisms but also for improving technologies for tissue bioengineering as well as to enable highly precise and predictive diagnostics of clinical samples [123].
Measuring and predicting causal cell behaviour by time-resolved, continuous single-cell microscopy
Conclusions
Since its birth 15 years ago high-throughput/high-content microscopy has enabled the systematic identification of genes, pathways and cell biological mechanisms by exploiting single-cell derived rich structural as well as dynamical information about cells, their context and their evolution within cell populations. The recent marriage of single-cell microscopy phenotyping with predictive data analytics powered by ML/AI is dramatically shifting the focus from observing and exploiting cell behaviour to learning how to accurately predict it, with promising basic and biomedical applications. In many ways, single-cell microscopy phenotyping remains for the most part an orthogonal source of biological Big Data separate from ‘omics derived single-cell data, with different tradeoff between spatiotemporal resolution and the amount of readouts being followed. We foresee that in the future substantial effort will be invested in producing a similar marriage between ‘omics single-cell data strategies and high-throughput/high-content microscopy phenomics—currently sparse and mostly challenging [124–126]—and that this will bring about a fundamental step change in our capacity to predict and potentially control cell function. With microscopy the future is bright.
Summary
In the past 15 years, single-cell based microscopy has evolved its focus from observing cell function to aiming to predict it.
This has been possible thanks to breakthroughs in computer vision, large-scale image analysis and machine learning.
In this way high-throughput/high-content microscopy has enabled to discover and annotate genes, pathways and links between processes, and to begin clarifying cell decision making.
In the future, the marriage of single-cell ‘omics and single-cell microscopy will make possible an unprecedented understanding of cell and tissue behaviour and function.
Competing Interests
The authors declare that there are no competing interests associated with the manuscript.