One of the fundamental challenges in neuroscience is linking behaviors with the dynamics of specific groups of neurons in an animal's nervous system. Consider the richness of the cues perceived by an animal and the range of motor actions it can choose to respond with. Under constantly changing conditions, the animal has to extract relevant features from the high-dimensional stimuli, choose among many alternatives and coordinate appropriate motor actions rather quickly. Remarkably, even simple living systems seem to be able to do such high-dimensional computations very robustly under nonstationary and noisy conditions. Yet, even in simple animal model systems, the mechanisms for coordinating robust population-level information processing through the animal's nervous system are not fully understood.
I am interested in understanding how nervous systems coordinate the population dynamics of neural pathways that mediate robust sensorimotor communication and control. Recent advances in genetics and electrophysiological technologies are opening up avenues to study how Drosophila (fruit fly) accomplishes this impressive engineering feat. I am drawn to Drosophila because it is a tractable model system that displays a wide array of interesting behaviors. Using Drosophila as the experimental model system, I work on developing tools for simultaneous multi-unit in vivo recordings of neuronal activity, for extracting relevant information from such high-dimensional recordings and for relating that information to an animal's behavior. On the analysis side, I strive to develop computational tools that are general and useful in other model systems as well.
In the context of Drosophila systems neuroscience, I work closely with Jayaraman Lab, Harris Lab (APIG), Card Lab and Simpson Lab in Janelia Farm. I also have active computational collaborations with labs working on the rodent model system (Pastalkova Lab, Dudman Lab and A.Lee Lab).
Digital reconstruction of neurons from microscope images is an important and challenging problem in neuroscience. In this paper, we propose a model-based method to tackle this problem. We first formulate a model structure, then develop an algorithm for computing it by carefully taking into account morphological characteristics of neurons, as well as the image properties under typical imaging protocols. The method has been tested on the data sets used in the DIADEM competition and produced promising results for four out of the five data sets.
Establishing visual correspondences is a critical step in many computer vision tasks involving multiple views of a scene. In a dynamic environment and when cameras are mobile, visual correspondences need to be updated on a recurring basis. At the same time, the use of wireless links between camera motes imposes tight rate constraints. This combination of issues motivates us to consider the problem of establishing visual correspondences in a distributed fashion between cameras operating under rate constraints. We propose a solution based on constructing distance preserving hashes using binarized random projections. By exploiting the fact that descriptors of regions in correspondence are highly correlated, we propose a novel use of distributed source coding via linear codes on the binary hashes to more efficiently exchange feature descriptors for establishing correspondences across multiple camera views. A systematic approach is used to evaluate rate vs visual correspondences retrieval performance; under a stringent matching criterion, our proposed methods demonstrate superior performance to a baseline scheme employing transform coding of descriptors.
Prior Publications (12)
We investigate a practical approach to solving one instantiation of a distributed hypothesis testing problem under severe rate constraints that shows up in a wide variety of applications such as camera calibration, biometric authentication and video hashing: given two distributed continuous-valued random sources, determine if they satisfy a certain Euclidean distance criterion. We show a way to convert the problem from continuous-valued to binary-valued using binarized random projections and obtain rate savings by applying a linear syndrome code. In finding visual correspondences, our approach uses just 49% of the rate of scalar quantization to achieve the same level of retrieval performance. To perform video hashing, our approach requires only a hash rate of 0.0142 bpp to identify corresponding groups of pictures correctly.
Camera networks are widely used for tasks such as surveillance, monitoring and tracking. In order to accomplish these tasks, knowledge of localization information such as camera locations and other geometric constraints about the environment (e.g. walls, rooms, and building layout) are typically considered to be essential. However, this information is not always required for many tasks such as estimating the topology of camera network coverage, or coordinate-free object tracking and navigation. In this paper, we propose a simplicial representation (called CN- complex) that can be constructed from discrete local observations from cameras, and utilize this novel representation to recover the topological information of the network coverage. We prove that our representation captures the correct topological information from network coverage for 2.5-D layouts, and demonstrate their utility in simulations as well as a real-world experimental set-up. Our proposed approach is particularly useful in the context of ad-hoc camera networks in indoor/outdoor urban environments with distributed but limited computational power and energy.
Multi-modal target tracking using heterogeneous sensor networks17th International Conference on Computer Communications and Networks 2008
M. Kushwaha, I. Amundson, P. Volgyesi, P. Ahammad, G. Simon, X. Koutsoukos, A. Ledeczi, and S. Sastry 17th International Conference on Computer Communications and Networks, (2008)
The paper describes a target tracking system running on a Heterogeneous Sensor Network (HSN) and presents results gathered from a realistic deployment. The system fuses audio direction of arrival data from mote class devices and object detection measurements from embedded PCs equipped with cameras. The acoustic sensor nodes perform beamforming and measure the energy as a function of the angle. The camera nodes detect moving objects and estimate their angle. The sensor detections are sent to a centralized sensor fusion node via a combination of two wireless networks. The novelty of our system is the unique combination of target tracking methods customized for the application at hand and their implementation on an actual HSN platform.
High speed action recognition and localization in compressed domain videosIEEE Transactions on Circuits and Systems for Video Technology: Special issue on Video Surveillance 2008
C. Yeo, P. Ahammad, K. Ramchandran, and S. Sastry IEEE Transactions on Circuits and Systems for Video Technology: Special issue on Video Surveillance, (2008)
We present a compressed domain scheme that is able to recognize and localize actions at high speeds. The recognition problem is posed as performing an action video query on a test video sequence. Our method is based on computing motion similarity using compressed domain features which can be extracted with low complexity. We introduce a novel motion correlation measure that takes into account differences in motion directions and magnitudes. Our method is appearance invariant, requires no prior segmentation, alignment or stabilization, and is able to localize actions in both space and time. We evaluated our method on a benchmark action video database consisting of 6 actions performed by 25 people under 3 different scenarios. Our proposed method achieved a classification accuracy of 90%, comparing favorably with existing methods in action classification accuracy, and is able to localize a template video of 80 x 64 pixels with 23 frames in a test video of 368 x 184 pixels with 835 frames in just 11 seconds, easily outperforming other methods in localization speed. We also perform a systematic investigation of the effects of various encoding options on our proposed approach. In particular, we present results on the compression-classification trade-off, which would provide valuable insight into jointly designing a system that performs video encoding at the camera front-end and action classification at the processing backend.
We consider the problem of communicating compact descriptors for the purpose of establishing visual correspondences between two cameras operating under rate constraints. Establishing visual correspondences is a critical step before other tasks such as camera calibration or object recognition can be performed in a network of cameras. We verify that descriptors of regions which are in correspondence are highly correlated, and propose the use of distributed source coding to reduce the bandwidth needed for transmitting descriptors required to establish correspondence. Our experiments demonstrate that the proposed scheme is able to provide compression gains of 57% with minimal loss in the number of correctly established correspondences compared to a scheme that communicates the entire image of the scene losslessly in compressed form. Over a wide range of rates, the proposed scheme also provides superior performance when compared to simply transmitting all the feature descriptors.
In this paper, we propose and demonstrate a novel wireless camera network system, called CITRIC. The core component of this system is a new hardware platform that integrates a camera, a frequency-scalable (up to 624 MHz) CPU, 16MB FLASH, and 64MB RAM onto a single device. The device then connects with a standard sensor network mote to form a camera mote. The design enables in-network processing of images to reduce communication requirements, which has traditionally been high in existing camera networks with centralized processing. We also propose a back-end client/server architecture to provide a user interface to the system and support further centralized processing for higher-level applications. Our camera mote enables a wider variety of distributed pattern recognition applications than traditional platforms because it provides more computing power and tighter integration of physical components while still consuming relatively little power. Furthermore, the mote easily integrates with existing low-bandwidth sensor networks because it can communicate over the IEEE 802.15.4 protocol with other sensor network platforms. We demonstrate our system on three applications: image compression, target tracking, and camera localization.
We consider the problem of establishing visual correspondences in a distributed and rate-efficient fashion by broadcasting compact descriptors. Establishing visual correspondences is a critical task before other vision tasks can be performed in a camera network. We use coarsely quantized random projections of descriptors to build binary hashes, and use the hamming distance between binary hashes as a matching criterion. In this work, we show that the hamming distance between the binary hashes has a binomial distribution, with parameters that are a function of the number of random projections and the euclidean distance between the original descriptors. We present experimental results that verify our result, and show that for the task of finding visual correspondences, sending binary hashes is more rate-efficient than prior approaches.
Given a large collection of videos containing activities, we investigate the problem of organizing it in an unsupervised fashion into a hierarchy based on the similarity of actions embedded in the videos. We use spatio-temporal volumes of filtered motion vectors to compute appearance-invariant action similarity measures efficiently - and use these similarity measures in hierarchical agglomerative clustering to organize videos into a hierarchy such that neighboring nodes contain similar actions. This naturally leads to a simple automatic scheme for selecting videos of representative actions (exemplars) from the database and for efficiently indexing the whole database. We compute a performance metric on the hierarchical structure to evaluate goodness of the estimated hierarchy, and show that this metric has potential for predicting the clustering performance of various joining criteria used in building hierarchies. Our results show that perceptually meaningful hierarchies can be constructed based on action similarities with minimal user supervision, while providing favorable clustering performance and retrieval performance.
Comparative analysis of spatial patterns of gene expression in Drosophila Melanogaster imaginal discsInternational Conference on Research in Computational Molecular Biology 2007
C. Harmon, P. Ahammad, A. S. Hammonds, R. Weiszmann, S. E. Celniker, S. Sastry, and G. M. Rubin International Conference on Research in Computational Molecular Biology , (2007)
Determining the precise spatial extent of expression of genes across different tissues, along with knowledge of the biochemical function of the genes is critical for understanding the roles of various genes in the development of metazoan organisms. To address this problem, we have developed high-throughput methods for generating images of gene expression in Drosophila melanogaster imaginal discs and for the automated analysis of these images. Our method automatically learns tissue shapes from a small number of manually segmented training examples and automatically aligns, extracts and scores new images, which are analyzed to generate gene expression maps for each gene. We have developed a reverse lookup procedure that enables us to identify genes that have spatial expression patterns most similar to a given gene of interest. Our methods enable us to cluster both the genes and the pixels that of the maps, thereby identifying sets of genes that have similar patterns, and regions of the tissues of interest that have similar gene expression profiles across a large number of genes.
We present a compressed domain scheme that is able to recognize and localize actions in real-time. The recognition problem is posed as performing a video query on a test video sequence. Our method is based on computing motion similarity using compressed domain features which can be extracted with low complexity. We introduce a novel motion correlation measure that takes into account differences in motion magnitudes. Our method is appearance invariant, requires no prior segmentation, alignment or stabilization, and is able to localize actions in both space and time. We evaluated our method on a large action video database consisting of 6 actions performed by 25 people under 3 different scenarios. Our classification results compare favorably with existing methods at only a fraction of their computational cost.
Joint nonparametric alignment for analyzing spatial gene expression patterns of Drosophila imaginal discs IEEE Conference on Computer Vision and Pattern Recognition 2005
P. Ahammad, C. Harmon, A. S. Hammonds, S. Sastry, and G. M. Rubin IEEE Conference on Computer Vision and Pattern Recognition, (2005)
To compare spatial patterns of gene expression, one must analyze a large number of images as current methods are only able to measure a small number of genes at a time. Bringing images of corresponding tissues into alignment is a critical first step in making a meaningful comparative analysis of these spatial patterns. Significant image noise and variability in the shapes make it hard to pick a canonical shape model. In this paper, we address these problems by combining segmentation and unsupervised shape learning algorithms. We first segment images to acquire structures of interest, then jointly align the shapes of these acquired structures using an unsupervised nonparametric maximum likelihood algorithm along the lines of congealing, while simultaneously learning the underlying shape model and associated transformations. The learned transformations are applied to corresponding images to bring them into alignment in one step. We demonstrate the results for images of various classes of Drosophila imaginal discs and discuss the methodology used for a quantitative analysis of spatial gene expression patterns.
The correction of bias in magnetic resonance images is an important problem in medical image processing. Most previous approaches have used a maximum likelihood method to increase the likelihood of the pixels in a single image by adaptively estimating a correction to the unknown image bias field. The pixel likelihoods are defined either in terms of a pre-existing tissue model, or non-parametrically in terms of the image's own pixel values. In both cases, the specific location of a pixel in the image is not used to calculate the likelihoods. We suggest a new approach in which we simultaneously eliminate the bias from a set of images of the same anatomy, but from different patients. We use the statistics from the same location across different images, rather than within an image, to eliminate bias fields from all of the images simultaneously. The method builds a multi-resolution non-parametric tissue model conditioned on image location while eliminating the bias fields associated with the original image set. We present experiments on both synthetic and real MR data sets, and present comparisons with other methods.