Although recent innovations have made imaging more efficient, the subsequent step of image analysis is a persistent bottleneck. Many years of human effort solved the image analysis problem for c. elegans, but this approach is too costly to enable the routine and rapid production of wiring diagrams - especially for circuits with tens or hundreds of thousands of neurons. Automation of image analysis is therefore a critical problem for neurobiology, and the one that forms the core of my research program.
Machine Learning of Image Analysis
To infer a connectivity matrix from images of dense neuropil, two image analysis tasks must be performed: (1) identifying synapses, and (2) tracing neurites to their parent cell bodies. The first task can be posed as a visual object recognition problem, in which the goal is to determine whether an image contains an example of an object category.
The second task can be posed as an image segmentation problem, in which the goal is to group pixels into distinct partitions that correspond to physical objects. In order to produce useful wiring diagrams, image segmentation must be performed with extraordinary accuracy. A mistake in synapse identification will cause at most a single error in a connectivity matrix, but a single error in neurite tracing could misassign thousands of synapses.
In previous work, we have introduced new image segmentation algorithms based on a machine learning approach to image analysis. By developing novel classifiers and learning algorithms specialized for the problem of segmentation, our approach has yielded new levels of performance in automated EM reconstruction. However, these methods have largely focused on a relatively local analysis of the image volume; further improvements in accuracy are likely to require automated methods that can reason about much larger amounts of image context. We are pursuing such improvements through research in several directions:
- Segmentation is an example of a structured prediction problem. Probabilistic graphical models and non-probabilistic energy-based models are powerful representations for dealing with structured prediction problems, but generally define very difficult learning and inference problems. For EM reconstruction we will eventually encounter models with millions of objects and a practically infinite combinatorial space of configurations of those objects. What learning and inference algorithms are effective for dealing with such problems? There is a large and growing literature on efficient (approximate) methods for inference and learning in such models, but such research has been done on different domains and problems. EM reconstruction may require a fresh approach to these issues.
- Neurites are very specialized 3d structures and therefore there should exist effective low-dimensional features that describe them. What is the best approach for devising such descriptors (e.g., unsupervised learning, human specification, or end-to-end discriminative learning) and how can they be used to best improve EM reconstruction accuracy?
Semi-automated reconstruction by integrating machine learning, computer vision, and human-computer interaction
In addition to advancing the basic computer vision and machine learning research involved in automated reconstruction, we are solving the science and engineering challenges involved in capitalizing on existing automated techniques in reconstructing wiring diagrams. In particular, using state of the art but imperfect automated techniques to produce accurate descriptions of connectivity requires sophisticated software that enables humans to ‘proofread’ computer reconstructions by identifying and correcting errors. Our approach integrates machine learning algorithms directly into the proofreading process, which will enable progress on several important problems:
- The true metric of interest in a reconstruction project is the amount of human effort required to proofread and correct errors in an automated segmentation (the ‘nuisance metric’). However, all machine learning approaches to image segmentation are designed to optimize some other metric (such as pixel error). Can a machine learning strategy be devised that directly optimizes the nuisance metric?
- Machine learning requires ground truth, which in EM reconstruction is typically provided by humans and is thus expensive to acquire. How can human input be selectively acquired such that it is most informative for a particular learning algorithm? The general form of this problem has been well-studied as ‘active learning,’ but has yet to be rigorously pursued within the context of image segmentation. A successful solution to this problem will increase performance of automated reconstruction by maximizing the information content provided by human labeling.
Reconstructing Sensory Circuits
In collaboration with labs at Janelia and elsewhere, we are applying our reconstruction algorithms and tools to answer fundamental biological questions related to how local circuitry in sensory systems process information. We are excited about combining measurements of neuronal activity and connectivity to try and understand computational principles that govern the nervous system.
An agglomerative clustering algorithm merges the most similar pair of clusters at every iteration. The function that evaluates similarity is traditionally handdesigned, but there has been recent interest in supervised or semisupervised settings in which ground-truth clustered data is available for training. Here we show how to train a similarity function by regarding it as the action-value function of a reinforcement learning problem. We apply this general method to segment images by clustering superpixels, an application that we call Learning to Agglomerate Superpixel Hierarchies (LASH). When applied to a challenging dataset of brain images from serial electron microscopy, LASH dramatically improved segmentation accuracy when clustering supervoxels generated by state of the boundary detection algorithms. The naive strategy of directly training only supervoxel similarities and applying single linkage clustering produced less improvement.
Prior Publications (10)
Comprehensive high-resolution structural maps are central to functional exploration and understanding in biology. For the nervous system, in which high resolution and large spatial extent are both needed, such maps are scarce as they challenge data acquisition and analysis capabilities. Here we present for the mouse inner plexiform layer--the main computational neuropil region in the mammalian retina--the dense reconstruction of 950 neurons and their mutual contacts. This was achieved by applying a combination of crowd-sourced manual annotation and machine-learning-based volume segmentation to serial block-face electron microscopy data. We characterize a new type of retinal bipolar interneuron and show that we can subdivide a known type based on connectivity. Circuit motifs that emerge from our data indicate a functional mechanism for a known cellular response in a ganglion cell that detects localized motion, and predict that another ganglion cell is motion sensitive.
Many image segmentation algorithms first generate an affinity graph and then partition it. We present a machine learning approach to computing an affinity graph using a convolutional network (CN) trained using ground truth provided by human experts. The CN affinity graph can be paired with any standard partitioning algorithm and improves segmentation accuracy significantly compared to standard hand-designed affinity functions. We apply our algorithm to the challenging 3D segmentation problem of reconstructing neuronal processes from volumetric electron microscopy (EM) and show that we are able to learn a good affinity graph directly from the raw EM images. Further, we show that our affinity graph improves the segmentation accuracy of both simple and sophisticated graph partitioning algorithms. In contrast to previous work, we do not rely on prior knowledge in the form of hand-designed image features or image preprocessing. Thus, we expect our algorithm to generalize effectively to arbitrary image types.
Boundary learning by optimization with topological constraintsIEEE Conference on Computer Vision and Pattern Recognition 2010
V. Jain, B. Bollmann, M. Richardson, D. R. Berger, M. N. Helmstaedter, K. L. Briggman, W. Denk, J. B. Bowden, J. M. Mendenhall, W. C. Abraham, K. M. Harris, N. Kasthuri, K. J. Hayworth, R. Schalek, J. Tapia, J. W. Lichtman, and H. Seung IEEE Conference on Computer Vision and Pattern Recognition , (2010)
Recent studies have shown that machine learning can improve the accuracy of detecting object boundaries in images. In the standard approach, a boundary detector is trained by minimizing its pixel-level disagreement with human boundary tracings. This naive metric is problematic because it is overly sensitive to boundary locations. This problem is solved by metrics provided with the Berkeley Segmentation Dataset, but these can be insensitive to topological differences, such as gaps in boundaries. Furthermore, the Berkeley metrics have not been useful as cost functions for supervised learning. Using concepts from digital topology, we propose a new metric called the warping error that tolerates disagreements over boundary location, penalizes topological disagreements, and can be used directly as a cost function for learning boundary detection, in a method that we call Boundary Learning by Optimization with Topological Constraints (BLOTC). We trained boundary detectors on electron microscopic images of neurons, using both BLOTC and standard training. BLOTC produced substantially better performance on a 1.2 million pixel test set, as measured by both the warping error and the Rand index evaluated on segmentations generated from the boundary labelings. We also find our approach yields significantly better segmentation performance than either gPb-OWT-UCM or multiscale normalized cut, as well as Boosted Edge Learning trained directly on our data.
Connections between neurons can be found by checking whether synapses exist at points of contact, which in turn are determined by neural shapes. Finding these shapes is a special case of image segmentation, which is laborious for humans and would ideally be performed by computers. New metrics properly quantify the performance of a computer algorithm using its disagreement with 'true' segmentations of example images. New machine learning methods search for segmentation algorithms that minimize such metrics. These advances have reduced computer errors dramatically. It should now be faster for a human to correct the remaining errors than to segment an image manually. Further reductions in human effort are expected, and crucial for finding connectomes more complex than that of Caenorhabditis elegans.
We present an approach to solving computer vision problems in which the goal is to produce a high-dimensional, pixel-based interpretation of some aspect of the underlying structure of an image. Such tasks have traditionally been categorized as “low-level vision” problems, and examples include image denoising, boundary detection, and motion estimation. Our approach is characterized by two main elements, both of which represent a departure from previous work. The first is a focus on convolutional networks, a machine learning strategy that operates directly on an input image with no use of hand-designed features and employs many thousands of free parameters that are learned from data. Previous work in low-level vision has been largely focused on completely handdesigned algorithms or learning methods with a hand-designed feature space. We demonstrate that a learning approach with high model complexity, but zero prior knowledge about any specific image domain, can outperform existing techniques even in the challenging area of natural image processing. We also present results that establish how convolutional networks are closely related to Markov random fields (MRFs), a popular probabilistic approach to image analysis, but can in practice can achieve significantly greater model complexity. The second aspect of our approach is the use of domain specific cost functions and learning algorithms that reflect the structured nature of certain prediction problems in image analysis. In particular, we show how concepts from digital topology can be used in the context of boundary detection to both evaluate and optimize the high-order property of topological accuracy. We demonstrate that these techniques can significantly improve the machine learning approach and outperform state of the art boundary detection and segmentation methods. Throughout our work we maintain a special interest and focus on application of our methods to connectomics, an emerging scientific discipline that seeks highthroughput methods for recovering neural connectivity data from brains. This application requires solving low-level image analysis problems on a tera-voxel or peta-voxel scale, and therefore represents an extremely challenging and exciting arena for the development of computer vision methods.
We present an approach to low-level vision that combines two main ideas: the use of convolutional networks as an image processing architecture and an unsupervised learning procedure that synthesizes training samples from specific noise models. We demonstrate this approach on the challenging problem of natural image denoising. Using a test set with a hundred natural images, we find that convolutional networks provide comparable and in some cases superior performance to state of the art wavelet and Markov random field (MRF) methods. Moreover, we find that a convolutional network offers similar performance in the blind denoising setting as compared to other techniques in the non-blind setting. We also show how convolutional networks are mathematically related to MRF approaches by presenting a mean field theory for an MRF specially designed for image denoising. Although these approaches are related, convolutional networks avoid computational difficulties in MRF approaches that arise from probabilistic learning and inference. This makes it possible to learn image processing architectures that have a high degree of representational power (we train models with over 15,000 parameters), but whose computational expense is significantly less than that associated with inference in MRF approaches with even hundreds of parameters.
Supervised learning of image restoration with convolutional networksIEEE 11th International Conference on Computer Vision 2007
V. Jain, J. F. Murray, F. Roth, S. Turaga, V. Zhigulin, K. L. Briggman, M. N. Helmstaedter, W. Denk, and H. Sueng IEEE 11th International Conference on Computer Vision, 2:1-8 (2007)
Convolutional networks have achieved a great deal of success in high-level vision problems such as object recognition. Here we show that they can also be used as a general method for low-level image processing. As an example of our approach, convolutional networks are trained using gradient learning to solve the problem of restoring noisy or degraded images. For our training data, we have used electron microscopic images of neural circuitry with ground truth restorations provided by human experts. On this dataset, Markov random field (MRF), conditional random field (CRF), and anisotropic diffusion algorithms perform about the same as simple thresholding, but superior performance is obtained with a convolutional network containing over 34,000 adjustable parameters. When restored by this convolutional network, the images are clean enough to be used for segmentation, whereas the other approaches fail in this respect. We do not believe that convolutional networks are fundamentally superior to MRFs as a representation for image processing algorithms. On the contrary, the two approaches are closely related. But in practice, it is possible to train complex convolutional networks, while even simple MRF models are hindered by problems with Bayesian learning and inference procedures. Our results suggest that high model complexity is the single most important factor for good performance, and this is possible with convolutional networks.
There is little consensus about the computational function of top-down synaptic connections in the visual system. Here we explore the hypothesis that top-down connections, like bottom-up connections, reflect partwhole relationships. We analyze a recurrent network with bidirectional synaptic interactions between a layer of neurons representing parts and a layer of neurons representing wholes. Within each layer, there is lateral inhibition. When the network detects a whole, it can rigorously enforce part-whole relationships by ignoring parts that do not belong. The network can complete the whole by filling in missing parts. The network can refuse to recognize a whole, if the activated parts do not conform to a stored part-whole relationship. Parameter regimes in which these behaviors happen are identified using the theory of permitted and forbidden sets [3, 4]. The network behaviors are illustrated by recreating Rumelhart and McClelland’s “interactive activation” model .
Many problems in voice recognition and audio processing involve feature extraction from raw waveforms. The goal of feature extraction is to reduce the dimensionality of the audio signal while preserving the informative signatures that, for example, distinguish different phonemes in speech or identify particular instruments in music. If the acoustic variability of a data set is described by a small number of continuous features, then we can imagine the data as lying on a low dimensional manifold in the high dimensional space of all possible waveforms. Locally linear embedding (LLE) is an unsupervised learning algorithm for feature extraction in this setting. In this paper, we present results from the exploratory analysis and visualization of speech and music by LLE.
A smorgasbord of features for statistical machine translationHuman Language Technology and 5th Meeting of the NAACL 2004
F. J. Och, D. Gildea, S. Khudanpur, A. Sarkar, K. Yamada, A. Fraser, S. Kumar, L. Shen, D. Smith, K. Eng, V. Jain, Z. Jin, and D. Radev Human Language Technology and 5th Meeting of the NAACL, (2004)
We describe a methodology for rapid experimentation in statistical machine translation which we use to add a large number of features to a baseline system exploiting features from a wide range of levels of syntactic representation. Feature values were combined in a log-linear model to select the highest scoring candidate translation from an n-best list. Feature weights were optimized directly against the BLEU evaluation metric on held-out data. We present results for a small selection of features at each level of syntactic representation.
The Jain lab has open positions for researchers with experience and interests in machine learning, computer vision, or quantitative analysis of brain wiring diagrams.
Example projects being pursued in the lab include:
* developing large, deep machine learning networks for biological image analysis using Janelia’s 4000-core CPU/GPU cluster
* developing novel structured prediction methods for parsing image data
* developing tools for crowdsourcing image analysis
* developing novel analytical techniques for understanding network structure in wiring diagrams
Positions could be at the level of a postdoctoral fellow, research scientist, or software engineer, depending on the individual.
Please contact Dr. Viren Jain with questions or inquiries: email@example.com
If you have specific salary requirements, please include them in your e-mail; all information is confidential. HHMI is an equal opportunity employer.