My recent research interests focus on two aspects. First, I have been developing effective and high-throughput biomedical image analysis, computer vision, and machine learning techniques for life sciences studies. Second, I have used these techniques to attack challenging problems in molecular biology, neuroscience, and biomedical engineering.
I earned my Ph.D. in computer vision and machine learning. In my Ph.D. thesis, I developed a novel visual computing model based on selective attention mechanism in human vision and used it to solve pattern recognition problems. Through this study, I became extremely interested in biology motivated visual computing. I then took a postdoctoral training at Duke University Medical Center under the supervision of Dr. Dale Purves to study human visual perception and its neural underpinnings. There I discovered through large-scale statistical analysis of natural scene images that many human color perceptual phenomena can be predicted by natural scene statistics.
With my training on both computer science and neuroscience, I then chose to cross the boundary between the fields and apply computational approaches to biological studies. I took another postdoc training at the Life Sciences Division at the Lawrence Berkeley National Lab, where I developed an effective bioimage data mining techniques to predict the malignancy of mammalian breast epithelial cells based on the spatial distribution of nuclear proteins in microscopy images. While I was at the Berkeley Lab, I was very excited about advanced fluorescent tissue labeling and super-resolution bioimaging techniques. I realized that with these techniques, biologists are now able to acquire 3D high-resolution images of an organism from macroscale to microscale, both in vivo and in vitro, and start to ask questions that were otherwise impossible to address. However, the complicated structures of organelles, cells, and tissues, the rapid dynamics of cellular processes, the huge size (hundreds of megabytes to multi-gigabytes and even terabytes), and the enormous number of images, all make it very challenging to manually extract useful biological knowledge from these images.
I thus chose to use my expertise to bridge this gap. In late 2005, I joined Janelia Farm Research Campus (JFRC) of Howard Hughes Medical Institute (HHMI) as a research scientist, where I have been developing high-throughput bioimage analysis and data mining techniques to tackle several significant yet challenging problems in neuroscience and molecular biology.
Digital reconstruction of neurons from microscope images is an important and challenging problem in neuroscience. In this paper, we propose a model-based method to tackle this problem. We first formulate a model structure, then develop an algorithm for computing it by carefully taking into account morphological characteristics of neurons, as well as the image properties under typical imaging protocols. The method has been tested on the data sets used in the DIADEM competition and produced promising results for four out of the five data sets.
Automatic alignment (registration) of 3D images of adult fruit fly brains is often influenced by the significant displacement of the relative locations of the two optic lobes (OLs) and the center brain (CB). In one of our ongoing efforts to produce a better image alignment pipeline of adult fruit fly brains, we consider separating CB and OLs and align them independently. This paper reports our automatic method to segregate CB and OLs, in particular under conditions where the signal to noise ratio (SNR) is low, the variation of the image intensity is big, and the relative displacement of OLs and CB is substantial. We design an algorithm to find a minimum-cost 3D surface in a 3D image stack to best separate an OL (of one side, either left or right) from CB. This surface is defined as an aggregation of the respective minimum-cost curves detected in each individual 2D image slice. Each curve is defined by a list of control points that best segregate OL and CB. To obtain the locations of these control points, we derive an energy function that includes an image energy term defined by local pixel intensities and two internal energy terms that constrain the curve's smoothness and length. Gradient descent method is used to optimize this energy function. To improve both the speed and robustness of the method, for each stack, the locations of optimized control points in a slice are taken as the initialization prior for the next slice. We have tested this approach on simulated and real 3D fly brain image stacks and demonstrated that this method can reasonably segregate OLs from CBs despite the aforementioned difficulties.
The V3D system provides three-dimensional (3D) visualization of gigabyte-sized microscopy image stacks in real time on current laptops and desktops. V3D streamlines the online analysis, measurement and proofreading of complicated image patterns by combining ergonomic functions for selecting a location in an image directly in 3D space and for displaying biological measurements, such as from fluorescent probes, using the overlaid surface objects. V3D runs on all major computer platforms and can be enhanced by software plug-ins to address specific biological problems. To demonstrate this extensibility, we built a V3D-based application, V3D-Neuron, to reconstruct complex 3D neuronal structures from high-resolution brain images. V3D-Neuron can precisely digitize the morphology of a single neuron in a fruitfly brain in minutes, with about a 17-fold improvement in reliability and tenfold savings in time compared with other neuron reconstruction tools. Using V3D-Neuron, we demonstrate the feasibility of building a 3D digital atlas of neurite tracts in the fruitfly brain.
We built a digital nuclear atlas of the newly hatched, first larval stage (L1) of the wild-type hermaphrodite of Caenorhabditis elegans at single-cell resolution from confocal image stacks of 15 individual worms. The atlas quantifies the stereotypy of nuclear locations and provides other statistics on the spatial patterns of the 357 nuclei that could be faithfully segmented and annotated out of the 558 present at this developmental stage. We then developed an automated approach to assign cell names to each nucleus in a three-dimensional image of an L1 worm. We achieved 86% accuracy in identifying the 357 nuclei automatically. This computational method will allow high-throughput single-cell analyses of the post-embryonic worm, such as gene expression analysis, or ablation or stimulation of cells under computer control in a high-throughput functional screen.
Volume-object annotation system (VANO) is a cross-platform image annotation system that enables one to conveniently visualize and annotate 3D volume objects including nuclei and cells. An application of VANO typically starts with an initial collection of objects produced by a segmentation computation. The objects can then be labeled, categorized, deleted, added, split, merged and redefined. VANO has been used to build high-resolution digital atlases of the nuclei of Caenorhabditis elegans at the L1 stage and the nuclei of Drosophila melanogaster's ventral nerve cord at the late embryonic stage. AVAILABILITY: Platform independent executables of VANO, a sample dataset, and a detailed description of both its design and usage are available at research.janelia.org/peng/proj/vano. VANO is open-source for co-development.
The C. elegans cell lineage provides a unique opportunity to look at how cell lineage affects patterns of gene expression. We developed an automatic cell lineage analyzer that converts high-resolution images of worms into a data table showing fluorescence expression with single-cell resolution. We generated expression profiles of 93 genes in 363 specific cells from L1 stage larvae and found that cells with identical fates can be formed by different gene regulatory pathways. Molecular signatures identified repeating cell fate modules within the cell lineage and enabled the generation of a molecular differentiation map that reveals points in the cell lineage when developmental fates of daughter cells begin to diverge. These results demonstrate insights that become possible using computational approaches to analyze quantitative expression from many genes in parallel using a digital gene expression atlas.
MOTIVATION: Caenorhabditis elegans, a roundworm found in soil, is a widely studied model organism with about 1000 cells in the adult. Producing high-resolution fluorescence images of C.elegans to reveal biological insights is becoming routine, motivating the development of advanced computational tools for analyzing the resulting image stacks. For example, worm bodies usually curve significantly in images. Thus one must 'straighten' the worms if they are to be compared under a canonical coordinate system. RESULTS: We develop a worm straightening algorithm (WSA) that restacks cutting planes orthogonal to a 'backbone' that models the anterior-posterior axis of the worm. We formulate the backbone as a parametric cubic spline defined by a series of control points. We develop two methods for automatically determining the locations of the control points. Our experimental methods show that our approaches effectively straighten both 2D and 3D worm images.