Main Menu (Mobile)- Block

Main Menu - Block

janelia7_blocks-janelia7_fake_breadcrumb | block
Koyama Lab / Publications
custom | custom


facetapi-Q2b17qCsTdECvJIqZJgYMaGsr8vANl1n | block

Associated Lab

facetapi-W9JlIB1X0bjs93n1Alu3wHJQTTgDCBGe | block

Associated Project Team

facetapi-PV5lg7xuz68EAY8eakJzrcmwtdGEnxR0 | block
facetapi-021SKYQnqXW6ODq5W5dPAFEDBaEJubhN | block
general_search_page-panel_pane_1 | views_panes

30 Publications

Showing 21-30 of 30 results
Your Criteria:
    Eddy/Rivas Lab
    10/01/09 | A new generation of homology search tools based on probabilistic inference.
    Eddy SR
    Genome Informatics. International Conference on Genome Informatics. 2009 Oct;23(1):205-11

    Many theoretical advances have been made in applying probabilistic inference methods to improve the power of sequence homology searches, yet the BLAST suite of programs is still the workhorse for most of the field. The main reason for this is practical: BLAST’s programs are about 100-fold faster than the fastest competing implementations of probabilistic inference methods. I describe recent work on the HMMER software suite for protein sequence analysis, which implements probabilistic inference using profile hidden Markov models. Our aim in HMMER3 is to achieve BLAST’s speed while further improving the power of probabilistic inference based methods. HMMER3 implements a new probabilistic model of local sequence alignment and a new heuristic acceleration algorithm. Combined with efficient vector-parallel implementations on modern processors, these improvements synergize. HMMER3 uses more powerful log-odds likelihood scores (scores summed over alignment uncertainty, rather than scoring a single optimal alignment); it calculates accurate expectation values (E-values) for those scores without simulation using a generalization of Karlin/Altschul theory; it computes posterior distributions over the ensemble of possible alignments and returns posterior probabilities (confidences) in each aligned residue; and it does all this at an overall speed comparable to BLAST. The HMMER project aims to usher in a new generation of more powerful homology search tools based on probabilistic inference methods.

    View Publication Page
    Eddy/Rivas Lab
    07/01/09 | A tool for identification of genes expressed in patterns of interest using the Allen Brain Atlas.
    Davis FP, Eddy SR
    Bioinformatics. 2009 Jul 1;25(13):1647-54. doi: 10.1093/bioinformatics/btp288

    Gene expression patterns can be useful in understanding the structural organization of the brain and the regulatory logic that governs its myriad cell types. A particularly rich source of spatial expression data is the Allen Brain Atlas (ABA), a comprehensive genome-wide in situ hybridization study of the adult mouse brain. Here, we present an open-source program, ALLENMINER, that searches the ABA for genes that are expressed, enriched, patterned or graded in a user-specified region of interest.

    View Publication Page
    Eddy/Rivas Lab
    05/15/09 | Infernal 1.0: inference of RNA alignments.
    Nawrocki EP, Kolbe DL, Eddy SR
    Bioinformatics. 2009 May 15;25:1335-7. doi: 10.1093/bioinformatics/btp157

    SUMMARY: INFERNAL builds consensus RNA secondary structure profiles called covariance models (CMs), and uses them to search nucleic acid sequence databases for homologous RNAs, or to create new sequence- and structure-based multiple sequence alignments. AVAILABILITY: Source code, documentation and benchmark downloadable from INFERNAL is freely licensed under the GNU GPLv3 and should be portable to any POSIX-compliant operating system, including Linux and Mac OS/X.

    View Publication Page
    Eddy/Rivas Lab
    05/15/09 | Local RNA structure alignment with incomplete sequence.
    Kolbe DL, Eddy SR
    Bioinformatics. 2009 May 15;25(10):1236-43. doi: 10.1093/bioinformatics/btp154

    Accuracy of automated structural RNA alignment is improved by using models that consider not only primary sequence but also secondary structure information. However, current RNA structural alignment approaches tend to perform poorly on incomplete sequence fragments, such as single reads from metagenomic environmental surveys, because nucleotides that are expected to be base paired are missing.

    View Publication Page
    Eddy/Rivas Lab
    01/01/09 | A survey of nematode SmY RNAs.
    Jones TA, Otto W, Marz M, Eddy SR, Stadler PF
    RNA Biology. 2009 Jan-Mar;6(1):5-8

    SmY RNAs are a family of approximately 70-90 nt small nuclear RNAs found in nematodes. In C. elegans, SmY RNAs copurify in a small ribonucleoprotein (snRNP) complex related to the SL1 and SL2 snRNPs that are involved in nematode mRNA trans-splicing. Here we describe a comprehensive computational analysis of SmY RNA homologs found in the currently available genome sequences. We identify homologs in all sequenced nematode genomes in class Chromadorea. We are unable to identify homologs in a more distantly related nematode species, Trichinella spiralis (class: Dorylaimia), and in representatives of non-nematode phyla that use trans-splicing. Using comparative RNA sequence analysis, we infer a conserved consensus SmY RNA secondary structure consisting of two stems flanking a consensus Sm protein binding site. A representative seed alignment of the SmY RNA family, annotated with the inferred consensus secondary structure, has been deposited with the Rfam RNA families database.

    View Publication Page
    Eddy/Rivas Lab
    01/01/09 | Rfam: updates to the RNA families database.
    Gardner PP, Daub J, Tate JG, Nawrocki EP, Kolbe DL, Lindgreen S, Wilkinson AC, Finn RD, Griffiths-Jones S, Eddy SR, Bateman A
    Nucleic Acids Research. 2009 Jan;37(Database issue):D136-40. doi: 10.1093/nar/gkn766

    Rfam is a collection of RNA sequence families, represented by multiple sequence alignments and covariance models (CMs). The primary aim of Rfam is to annotate new members of known RNA families on nucleotide sequences, particularly complete genomes, using sensitive BLAST filters in combination with CMs. A minority of families with a very broad taxonomic range (e.g. tRNA and rRNA) provide the majority of the sequence annotations, whilst the majority of Rfam families (e.g. snoRNAs and miRNAs) have a limited taxonomic range and provide a limited number of annotations. Recent improvements to the website, methodologies and data used by Rfam are discussed. Rfam is freely available on the Web at

    View Publication Page
    Eddy/Rivas Lab
    09/19/08 | Probabilistic phylogenetic inference with insertions and deletions.
    Rivas E, Sean R. Eddy
    PLoS Computational Biology. 2008 Sep 19;4(9):e1000172. doi: 10.1371/journal.pcbi.1000172

    A fundamental task in sequence analysis is to calculate the probability of a multiple alignment given a phylogenetic tree relating the sequences and an evolutionary model describing how sequences change over time. However, the most widely used phylogenetic models only account for residue substitution events. We describe a probabilistic model of a multiple sequence alignment that accounts for insertion and deletion events in addition to substitutions, given a phylogenetic tree, using a rate matrix augmented by the gap character. Starting from a continuous Markov process, we construct a non-reversible generative (birth-death) evolutionary model for insertions and deletions. The model assumes that insertion and deletion events occur one residue at a time. We apply this model to phylogenetic tree inference by extending the program dnaml in phylip. Using standard benchmarking methods on simulated data and a new "concordance test" benchmark on real ribosomal RNA alignments, we show that the extended program dnamlepsilon improves accuracy relative to the usual approach of ignoring gaps, while retaining the computational efficiency of the Felsenstein peeling algorithm.

    View Publication Page
    Eddy/Rivas Lab
    05/30/08 | A probabilistic model of local sequence alignment that simplifies statistical significance estimation.
    Sean R. Eddy
    PLoS Computational Biology. 2008 May 30;4:e1000069. doi: 10.1371/journal.pcbi.1000069

    Sequence database searches require accurate estimation of the statistical significance of scores. Optimal local sequence alignment scores follow Gumbel distributions, but determining an important parameter of the distribution (lambda) requires time-consuming computational simulation. Moreover, optimal alignment scores are less powerful than probabilistic scores that integrate over alignment uncertainty ("Forward" scores), but the expected distribution of Forward scores remains unknown. Here, I conjecture that both expected score distributions have simple, predictable forms when full probabilistic modeling methods are used. For a probabilistic model of local sequence alignment, optimal alignment bit scores ("Viterbi" scores) are Gumbel-distributed with constant lambda = log 2, and the high scoring tail of Forward scores is exponential with the same constant lambda. Simulation studies support these conjectures over a wide range of profile/sequence comparisons, using 9,318 profile-hidden Markov models from the Pfam database. This enables efficient and accurate determination of expectation values (E-values) for both Viterbi and Forward scores for probabilistic local alignments.

    View Publication Page
    Eddy/Rivas Lab
    12/01/07 | Identification of differentially expressed small non-coding RNAs in the legume endosymbiont Sinorhizobium meliloti by comparative genomics.
    del Val C, Rivas E, Torres-Quesada O, Toro N, Jiménez-Zurdo JI
    Molecular Microbiology. 2007 Dec;66(5):1080-91. doi: 10.1111/j.1365-2958.2007.05978.x

    Bacterial small non-coding RNAs (sRNAs) are being recognized as novel widespread regulators of gene expression in response to environmental signals. Here, we present the first search for sRNA-encoding genes in the nitrogen-fixing endosymbiont Sinorhizobium meliloti, performed by a genome-wide computational analysis of its intergenic regions. Comparative sequence data from eight related alpha-proteobacteria were obtained, and the interspecies pairwise alignments were scored with the programs eQRNA and RNAz as complementary predictive tools to identify conserved and stable secondary structures corresponding to putative non-coding RNAs. Northern experiments confirmed that eight of the predicted loci, selected among the original 32 candidates as most probable sRNA genes, expressed small transcripts. This result supports the combined use of eQRNA and RNAz as a robust strategy to identify novel sRNAs in bacteria. Furthermore, seven of the transcripts accumulated differentially in free-living and symbiotic conditions. Experimental mapping of the 5’-ends of the detected transcripts revealed that their encoding genes are organized in autonomous transcription units with recognizable promoter and, in most cases, termination signatures. These findings suggest novel regulatory functions for sRNAs related to the interactions of alpha-proteobacteria with their eukaryotic hosts.

    View Publication Page
    Eddy/Rivas Lab
    03/30/07 | Query-dependent banding (QDB) for faster RNA similarity searches.
    Nawrocki EP, Eddy SR
    PLoS Computational Biology. 2007 Mar 30;3(3):e56. doi: 10.1371/journal.pcbi.0030056

    When searching sequence databases for RNAs, it is desirable to score both primary sequence and RNA secondary structure similarity. Covariance models (CMs) are probabilistic models well-suited for RNA similarity search applications. However, the computational complexity of CM dynamic programming alignment algorithms has limited their practical application. Here we describe an acceleration method called query-dependent banding (QDB), which uses the probabilistic query CM to precalculate regions of the dynamic programming lattice that have negligible probability, independently of the target database. We have implemented QDB in the freely available Infernal software package. QDB reduces the average case time complexity of CM alignment from LN(2.4) to LN(1.3) for a query RNA of N residues and a target database of L residues, resulting in a 4-fold speedup for typical RNA queries. Combined with other improvements to Infernal, including informative mixture Dirichlet priors on model parameters, benchmarks also show increased sensitivity and specificity resulting from improved parameterization.

    View Publication Page