Peng Lab
He uses these techniques to mine and fuse knowledge from three-dimensional animal brain images, at both micrometer and nanometer scales. His group is building 3D neuronal atlases of brains---incorporating neuron distribution, projection, and connection statistics and mapping functional data of neurons.
Such an atlas, in its digital form, can dramatically improve our ability to tackle the secrets of a brain, from the level of a single neuron or synapse to that of the entire brain. Recent advances in multicolor labeling and high-resolution imaging have made it possible to reconstruct computationally the 3D digital neuronal atlas of an animal at the single-neuron, whole-brain level.
In collaboration with several laboratories at Janelia Farm, we are developing novel image informatics tools to build a multiscale 3D digital atlas for the brain of a fruit fly (Drosophila melanogaster), a widely used model system in biology. A fly brain is estimated to have 100,000-150,000 neurons. In a simplified view, the neuronal network in a brain can be described as a forest, with each tree representing a neuron. It is crucial to develop data-mining techniques to identify, describe, compare, categorize, and search these trees and branches, and to model computationally the distribution and interaction of objects in the context of the entire forest. By targeting the fly brain, our goal, which is shared with other Janelia labs, is to build a high-resolution 3D digital atlas of a fly brain that includes the statistics of neuronal distributions, projections, and connections. To do this, we will use bioimage informatics and mining tools developed through collaboration.
One aspect of our research aims at bridging the gap between the micrometer- and nanometer-resolution image information collected through both light and electron microscopy. In previous studies, nanometer-scale images were often used to study connections of neurons. Micrometer-scale images were usually considered when comparing gene expression patterns and investigating neuronal functions. Appropriate integration of these two sources of data can significantly deepen our insights into how neurons distribute, connect to each other, and function in a brain.
Despite many accessible techniques for images acquired at a single-resolution scale, we still lack a set of high-performance techniques and software tools for multiscale bioimage mining and informatics. Our goal is to fuse and transform the enormous volume of heterogeneous information acquired at different imaging scales into meaningful knowledge. To do this, we are building a suite of tools of image analysis and mining, as well as visualization.
As a prerequisite to integrate information of different bioimaging scales and modalities, we are working on several techniques for analyzing brain images and neuronal patterns.
- We have developed and continue to improve 3D image registration software to align fly brain and thorax images, which could differ significantly in their morphology, intensity, orientation, and resolution (scale), and often correspond to different brain regions or have various artifacts.
- We are studying how to automate the extraction, or tracing, of neurons from 3D microscopic images and how to characterize their morphological and topological features.
- For registered brain images and reconstructed neurons, we will develop large-scale, high-throughput data-mining techniques to search, cluster, and classify patterns and detect the associations between these patterns and functional variables, such as animal behaviors. We are also developing computational tools to help biologists compare, annotate, and measure brain images.
Projects (2)
In a recent collaboration with Eugene Myers and Stuart Kim (Stanford University), we have developed a series of computational approaches to produce a 3D digital cell atlas for the animal Caenorhabditis elegans, and thus have collected the statistics of gene expression patterns at the single-cell resolution.
From an image-informatics perspective, building the 3D statistical neuronal atlas in a fly brain is conceptually similar to, although technically more challenging than, building the digital cell atlas of C. elegans. It is generally believed that many neurons in a fly brain are stereotyped. Collaborating with other Janelia labs, we are developing algorithms and computer programs to collect the statistics from many fly brains at different resolutions. This will result in a 3D statistical atlas, containing the statistics of neuron spatial distributions, projections, and connections.
We are also interested in using these statistics to (1) model the wiring of neurons in a fly brain, (2) detect the associations between neuronal distribution/connections and the animal behaviors, and (3) map other neuronal activity data (e.g., firing) to this atlas and analyze the relationships among these physiological neuronal activities and animal behaviors.
My lab is also developing new machine-learning and signal/image-processing methods for bioimaging data. We are interested in designing and implementing computational methods to promote various aspects of high-resolution imaging and improve our ability to understand microscopy images, including those related to super- spatial and temporal resolution and deep-tissue imaging. These research projects are natural extensions of my previous studies in areas such as feature/model learning for biomedical and multimedia signal/images. We are also interested in using these new developments to tackle problems in related domains of computational biology, e.g., to understand gene expression patterns and underlying genetic regulatory networks.
Janelia Publications
Few technologies are more widespread in modern biological laboratories than imaging. Recent advances in optical technologies and instrumentation are providing hitherto unimagined capabilities. Almost all these advances have required the development of software to enable the acquisition, management, analysis and visualization of the imaging data. We review each computational step that biologists encounter when dealing with digital images, the inherent challenges and the overall status of available software for bioimage informatics, focusing on open-source options.
Intercepting a moving object requires prediction of its future location. This complex task has been solved by dragonflies, who intercept their prey in midair with a 95% success rate. In this study, we show that a group of 16 neurons, called target-selective descending neurons (TSDNs), code a population vector that reflects the direction of the target with high accuracy and reliability across 360°. The TSDN spatial (receptive field) and temporal (latency) properties matched the area of the retina where the prey is focused and the reaction time, respectively, during predatory flights. The directional tuning curves and morphological traits (3D tracings) for each TSDN type were consistent among animals, but spike rates were not. Our results emphasize that a successful neural circuit for target tracking and interception can be achieved with few neurons and that in dragonflies this information is relayed from the brain to the wing motor centers in population vector form.
In a wide range of biological studies, it is highly desirable to visualize and analyze three-dimensional (3D) microscopic images. In this primer, we first introduce several major methods for visualizing typical 3D images and related multi-scale, multi-time-point, multi-color data sets. Then, we discuss three key categories of image analysis tasks, namely segmentation, registration, and annotation. We demonstrate how to pipeline these visualization and analysis modules using examples of profiling the single-cell gene-expression of C. elegans and constructing a map of stereotyped neurite tracts in a fruit fly brain.
The GFP reconstitution across synaptic partners (GRASP) technique, based on functional complementation between two nonfluorescent GFP fragments, can be used to detect the location of synapses quickly, accurately and with high spatial resolution. The method has been previously applied in the nematode and the fruit fly but requires substantial modification for use in the mammalian brain. We developed mammalian GRASP (mGRASP) by optimizing transmembrane split-GFP carriers for mammalian synapses. Using in silico protein design, we engineered chimeric synaptic mGRASP fragments that were efficiently delivered to synaptic locations and reconstituted GFP fluorescence in vivo. Furthermore, by integrating molecular and cellular approaches with a computational strategy for the three-dimensional reconstruction of neurons, we applied mGRASP to both long-range circuits and local microcircuits in the mouse hippocampus and thalamocortical regions, analyzing synaptic distribution in single neurons and in dendritic compartments.
MOTIVATION: Automatic recognition of cell identities is critical for quantitative measurement, targeting, and manipulation of cells of model animals at single-cell resolution. It has been shown to be a powerful tool for studying gene expression and regulation, cell lineages, and cell fates. Existing methods first segment cells, before applying a recognition algorithm in the second step. As a result, the segmentation errors in the first step directly affect and complicate the subsequent cell recognition step. Moreover, in new experimental settings, some of the image features that have been previously relied upon to recognize cells may not be easy to reproduce, due to limitations on the number of color channels available for fluorescent imaging or to the cost of building transgenic animals. An approach that is more accurate and relies on only a single signal channel is clearly desirable. RESULTS: We have developed a new method, called SRS (for Simultaneous Recognition and Segmentation of cells), and applied it to 3D image stacks of the model organism C. elegans. Given a 3D image stack of the animal and a 3D atlas of target cells, SRS is effectively an atlas-guided voxel classification process: cell recognition is realized by smoothly deforming the atlas to best fit the image, where the segmentation is obtained naturally via classification of all image voxels. The method achieved a 97.7% overall recognition accuracy in recognizing a key class of marker cells, the body wall muscle (BWM) cells, on a data set of 175 C. elegans image stacks containing 14,118 manually curated BWM cells providing the "ground-truth" for accuracy. This result was achieved without any additional fiducial image features. SRS also automatically identified 14 of the image stacks as involving ±90-degree rotations. With these stacks excluded from the data set, the recognition accuracy rose to 99.1%. We also show SRS is generally applicable to other cell-types, e.g. intestinal cells. AVAILABILITY: The supplementary movies can be downloaded from our website http://penglab.janelia.org/proj/celegans_seganno. The method has been implemented as a plug-in program within the V3D system (http://penglab.janelia.org/proj/v3d) and will be released in the V3D plugin source code repository.
Analyzing Drosophila melanogaster neural expression patterns in thousands of three-dimensional image stacks of individual brains requires registering them into a canonical framework based on a fiducial reference of neuropil morphology. Given a target brain labeled with predefined landmarks, the BrainAligner program automatically finds the corresponding landmarks in a subject brain and maps it to the coordinate system of the target brain via a deformable warp. Using a neuropil marker (the antibody nc82) as a reference of the brain morphology and a target brain that is itself a statistical average of data for 295 brains, we achieved a registration accuracy of 2 μm on average, permitting assessment of stereotypy, potential connectivity and functional mapping of the adult fruit fly brain. We used BrainAligner to generate an image pattern atlas of 2954 registered brains containing 470 different expression patterns that cover all the major compartments of the fly brain.
Full reconstruction of neuron morphology is of fundamental interest for the analysis and understanding of their functioning. We have developed a novel method capable of automatically tracing neurons in three-dimensional microscopy data. In contrast to template-based methods, the proposed approach makes no assumptions about the shape or appearance of neurite structure. Instead, an efficient seeding approach is applied to capture complex neuronal structures and the tracing problem is solved by computing the optimal reconstruction with a weighted graph. The optimality is determined by the cost function designed for the path between each pair of seeds and by topological constraints defining the component interrelations and completeness. In addition, an automated neuron comparison method is introduced for performance evaluation and structure analysis. The proposed algorithm is computationally efficient and has been validated using different types of microscopy data sets including Drosophila's projection neurons and fly neurons with presynaptic sites. In all cases, the approach yielded promising results.
The fruit fly (Drosophila melanogaster) is a commonly used model organism in biology. We are currently building a 3D digital atlas of the fruit fly larval nervous system (LNS) based on a large collection of fly larva GAL4 lines, each of which targets a subset of neurons. To achieve such a goal, we need to automatically align a number of high-resolution confocal image stacks of these GAL4 lines. One commonly employed strategy in image pattern registration is to first globally align images using an affine transform, followed by local non-linear warping. Unfortunately, the spatially articulated and often twisted LNS makes it difficult to globally align the images directly using the affine method. In a parallel project to build a 3D digital map of the adult fly ventral nerve cord (VNC), we are confronted with a similar problem.
Digital reconstruction of 3D neuron structures is an important step toward reverse engineering the wiring and functions of a brain. However, despite a number of existing studies, this task is still challenging, especially when a 3D microscopic image has low single-to-noise ratio and discontinued segments of neurite patterns.
Related Links
Automatic alignment (registration) of 3D images of adult fruit fly brains is often influenced by the significant displacement of the relative locations of the two optic lobes (OLs) and the center brain (CB). In one of our ongoing efforts to produce a better image alignment pipeline of adult fruit fly brains, we consider separating CB and OLs and align them independently. This paper reports our automatic method to segregate CB and OLs, in particular under conditions where the signal to noise ratio (SNR) is low, the variation of the image intensity is big, and the relative displacement of OLs and CB is substantial. We design an algorithm to find a minimum-cost 3D surface in a 3D image stack to best separate an OL (of one side, either left or right) from CB. This surface is defined as an aggregation of the respective minimum-cost curves detected in each individual 2D image slice. Each curve is defined by a list of control points that best segregate OL and CB. To obtain the locations of these control points, we derive an energy function that includes an image energy term defined by local pixel intensities and two internal energy terms that constrain the curve's smoothness and length. Gradient descent method is used to optimize this energy function. To improve both the speed and robustness of the method, for each stack, the locations of optimized control points in a slice are taken as the initialization prior for the next slice. We have tested this approach on simulated and real 3D fly brain image stacks and demonstrated that this method can reasonably segregate OLs from CBs despite the aforementioned difficulties.
The V3D system provides three-dimensional (3D) visualization of gigabyte-sized microscopy image stacks in real time on current laptops and desktops. V3D streamlines the online analysis, measurement and proofreading of complicated image patterns by combining ergonomic functions for selecting a location in an image directly in 3D space and for displaying biological measurements, such as from fluorescent probes, using the overlaid surface objects. V3D runs on all major computer platforms and can be enhanced by software plug-ins to address specific biological problems. To demonstrate this extensibility, we built a V3D-based application, V3D-Neuron, to reconstruct complex 3D neuronal structures from high-resolution brain images. V3D-Neuron can precisely digitize the morphology of a single neuron in a fruitfly brain in minutes, with about a 17-fold improvement in reliability and tenfold savings in time compared with other neuron reconstruction tools. Using V3D-Neuron, we demonstrate the feasibility of building a 3D digital atlas of neurite tracts in the fruitfly brain.
We built a digital nuclear atlas of the newly hatched, first larval stage (L1) of the wild-type hermaphrodite of Caenorhabditis elegans at single-cell resolution from confocal image stacks of 15 individual worms. The atlas quantifies the stereotypy of nuclear locations and provides other statistics on the spatial patterns of the 357 nuclei that could be faithfully segmented and annotated out of the 558 present at this developmental stage. We then developed an automated approach to assign cell names to each nucleus in a three-dimensional image of an L1 worm. We achieved 86% accuracy in identifying the 357 nuclei automatically. This computational method will allow high-throughput single-cell analyses of the post-embryonic worm, such as gene expression analysis, or ablation or stimulation of cells under computer control in a high-throughput functional screen.
The C. elegans cell lineage provides a unique opportunity to look at how cell lineage affects patterns of gene expression. We developed an automatic cell lineage analyzer that converts high-resolution images of worms into a data table showing fluorescence expression with single-cell resolution. We generated expression profiles of 93 genes in 363 specific cells from L1 stage larvae and found that cells with identical fates can be formed by different gene regulatory pathways. Molecular signatures identified repeating cell fate modules within the cell lineage and enabled the generation of a molecular differentiation map that reveals points in the cell lineage when developmental fates of daughter cells begin to diverge. These results demonstrate insights that become possible using computational approaches to analyze quantitative expression from many genes in parallel using a digital gene expression atlas.
Imaging informatics has emerged as a major research theme in biomedicine in the last few decades. Currently, personalised, predictive and preventive patient care is believed to be one of the top priorities in biomedical research and practice. Imaging informatics plays a major role in biomedicine studies. This paper reviews main applications and challenges of imaging informatics in biomedicine.
Volume-object annotation system (VANO) is a cross-platform image annotation system that enables one to conveniently visualize and annotate 3D volume objects including nuclei and cells. An application of VANO typically starts with an initial collection of objects produced by a segmentation computation. The objects can then be labeled, categorized, deleted, added, split, merged and redefined. VANO has been used to build high-resolution digital atlases of the nuclei of Caenorhabditis elegans at the L1 stage and the nuclei of Drosophila melanogaster's ventral nerve cord at the late embryonic stage. AVAILABILITY: Platform independent executables of VANO, a sample dataset, and a detailed description of both its design and usage are available at research.janelia.org/peng/proj/vano. VANO is open-source for co-development.
In recent years, the deluge of complicated molecular and cellular microscopic images creates compelling challenges for the image computing community. There has been an increasing focus on developing novel image processing, data mining, database and visualization techniques to extract, compare, search and manage the biological knowledge in these data-intensive problems. This emerging new area of bioinformatics can be called 'bioimage informatics'. This article reviews the advances of this field from several aspects, including applications, key techniques, available tools and resources. Application examples such as high-throughput/high-content phenotyping and atlas building for model organisms demonstrate the importance of bioimage informatics. The essential techniques to the success of these applications, such as bioimage feature identification, segmentation and tracking, registration, annotation, mining, image data management and visualization, are further summarized, along with a brief overview of the available bioimage databases, analysis tools and other resources.
MOTIVATION: Caenorhabditis elegans, a roundworm found in soil, is a widely studied model organism with about 1000 cells in the adult. Producing high-resolution fluorescence images of C.elegans to reveal biological insights is becoming routine, motivating the development of advanced computational tools for analyzing the resulting image stacks. For example, worm bodies usually curve significantly in images. Thus one must 'straighten' the worms if they are to be compared under a canonical coordinate system. RESULTS: We develop a worm straightening algorithm (WSA) that restacks cutting planes orthogonal to a 'backbone' that models the anterior-posterior axis of the worm. We formulate the backbone as a parametric cubic spline defined by a series of control points. We develop two methods for automatically determining the locations of the control points. Our experimental methods show that our approaches effectively straighten both 2D and 3D worm images.
Staining the mRNA of a gene via in situ hybridization (ISH) during the development of a D. melanogaster embryo delivers the detailed spatio-temporal pattern of expression of the gene. Many biological problems such as the detection of co-expressed genes, co-regulated genes, and transcription factor binding motifs rely heavily on the analyses of these image patterns. The increasing availability of ISH image data motivates the development of automated computational approaches to the analysis of gene expression patterns.
Prior Publications (8)
The distribution of chromatin-associated proteins plays a key role in directing nuclear function. Previously, we developed an image-based method to quantify the nuclear distributions of proteins and showed that these distributions depended on the phenotype of human mammary epithelial cells. Here we describe a method that creates a hierarchical tree of the given cell phenotypes and calculates the statistical significance between them, based on the clustering analysis of nuclear protein distributions.
Gene expression patterns obtained by in situ mRNA hybridization provide important information about different genes during Drosophila embryogenesis. So far, annotations of these images are done by manually assigning a subset of anatomy ontology terms to an image. This time-consuming process depends heavily on the consistency of experts.
Staining the mRNA of a gene via in situ hybridization (ISH) during the development of a D. melanogaster embryo delivers the detailed spatio-temporal pattern of expression of the gene. Many biological problems such as the detection of co-expressed genes, co-regulated genes, and transcription factor binding motifs rely heavily on the analyses of these image patterns. The increasing availability of ISH image data motivates the development of automated computational approaches to the analysis of gene expression patterns.
We study transitivity properties of edge weights in complex networks. We show that enforcing transitivity leads to a transitivity inequality which is equivalent to ultra-metric inequality. This can be used to define transitive closure on weighted undirected graphs, which can be computed using a modified Floyd-Warshall algorithm. These new concepts are extended to dissimilarity graphs and triangle inequalities. From this, we extend the clique concept from unweighted graph to weighted graph. We outline several applications and present results of detecting protein functional modules in a protein interaction network.
How to selecting a small subset out of the thousands of genes in microarray data is important for accurate classification of phenotypes. Widely used methods typically rank genes according to their differential expressions among phenotypes and pick the top-ranked genes. We observe that feature sets so obtained have certain redundancy and study methods to minimize it. We propose a minimum redundancy - maximum relevance (MRMR) feature selection framework. Genes selected via MRMR provide a more balanced coverage of the space and capture broader characteristics of phenotypes. They lead to significantly improved class predictions in extensive experiments on 6 gene expression data sets: NCI, Lymphoma, Lung, Child Leukemia, Leukemia, and Colon. Improvements are observed consistently among 4 classification methods: Naive Bayes, Linear discriminant analysis, Logistic regression, and Support vector machines. SUPPLIMENTARY: The top 60 MRMR genes for each of the datasets are listed in http://crd.lbl.gov/~cding/MRMR/. More information related to MRMR methods can be found at http://www.hpeng.net/.
Feature selection is an important problem for pattern classification systems. We study how to select good features according to the maximal statistical dependency criterion based on mutual information. Because of the difficulty in directly implementing the maximal dependency condition, we first derive an equivalent form, called minimal-redundancy-maximal-relevance criterion (mRMR), for first-order incremental feature selection. Then, we present a two-stage feature selection algorithm by combining mRMR and other more sophisticated feature selectors (e.g., wrappers). This allows us to select a compact set of superior features at very low cost. We perform extensive experimental comparison of our algorithm and other methods using three different classifiers (naive Bayes, support vector machine, and linear discriminate analysis) and four different data sets (handwritten digits, arrhythmia, NCI cancer cell lines, and lymphoma tissues). The results confirm that mRMR leads to promising improvement on feature selection and classification accuracy.
We present a novel computer algorithm for mapping biological pathways from one prokaryotic genome to another. The algorithm maps genes in a known pathway to their homologous genes (if any) in a target genome that is most consistent with (a) predicted orthologous gene relationship, (b) predicted operon structures, and (c) predicted co-regulation relationship of operons. Mathematically, we have formulated this problem as a constrained minimum spanning tree problem (called a Steiner network problem), and demonstrated that this formulation has the desired property through applications. We have solved this mapping problem using a combinatorial optimization algorithm, with guaranteed global optimality. We have implemented this algorithm as a computer program, called PMAP. Our test results on pathway mapping are highly encouraging -- we have mapped a number of pathways of H. influenzae, B. subtilis, H. pylori, and M. tuberculosis to E. coli using P-MAP, whose homologous pathways in E coli. are known and hence the mapping accuracy could be checked. We have then mapped known E. coli pathways in the EcoCyc database to the newly sequenced organism Synechococcus sp WH8102, and predicted 158 Synechococcus pathways. Detailed analyses on the predicted pathways indicate that P-MAP's mapping results are consistent with our general knowledge about (local) pathways. We believe that P-MAP will be a useful tool for microbial genome annotation projects and inference of individual microbial pathways.
Most methods for structure-function analysis of the brain in medical images are usually based on voxel-wise statistical tests performed on registered magnetic resonance (MR) images across subjects. A major drawback of such methods is the inability to accurately locate regions that manifest nonlinear associations with clinical variables. In this paper, we propose Bayesian morphological analysis methods, based on a Bayesian-network representation, for the analysis of MR brain images. First, we describe how Bayesian networks (BNs) can represent probabilistic associations among voxels and clinical (function) variables. Second, we present a model-selection framework, which generates a BN that captures structure-function relationships from MR brain images and function variables. We demonstrate our methods in the context of determining associations between regional brain atrophy (as demonstrated on MR images of the brain), and functional deficits. We employ two data sets for this evaluation: the first contains MR images of 11 subjects, where associations between regional atrophy and a functional deficit are almost linear; the second data set contains MR images of the ventricles of 84 subjects, where the structure-function association is nonlinear. Our methods successfully identify voxel-wise morphological changes that are associated with functional deficits in both data sets, whereas standard statistical analysis (i.e., t-test and paired t-test) fails in the nonlinear-association case.








