Main Menu (Mobile)- Block
- Our Research
-
Support Teams
- Overview
- Anatomy and Histology
- Cell and Tissue Culture
- Cryo-Electron Microscopy
- Drosophila Resources
- Electron Microscopy
- Flow Cytometry Shared Resource (FCSR)
- Gene Targeting and Transgenics
- Janelia Experimental Technology
- Light Microscopy
- Media Prep
- Molecular Biology
- Project Pipeline Support
- Project Technical Resources
- Quantitative Genomics
- Scientific Computing Software
- Scientific Computing Systems
- Viral Tools
- Vivarium
- Open Science
- You + Janelia
- About Us
Main Menu - Block
- Overview
- Anatomy and Histology
- Cell and Tissue Culture
- Cryo-Electron Microscopy
- Drosophila Resources
- Electron Microscopy
- Flow Cytometry Shared Resource (FCSR)
- Gene Targeting and Transgenics
- Janelia Experimental Technology
- Light Microscopy
- Media Prep
- Molecular Biology
- Project Pipeline Support
- Project Technical Resources
- Quantitative Genomics
- Scientific Computing Software
- Scientific Computing Systems
- Viral Tools
- Vivarium

Note: Research in this publication was not performed at Janelia.
Abstract
A new and conceptually simple data structure, called a suffix array, for on-line string searches is introduced in this paper. Constructing and querying suffix arrays is reduced to a sort and search paradigm that employs novel algorithms. The main advantage of suffix arrays over suffix trees is that, in practice, they use three to five times less space. From a complexity standpoint, suffix arrays permit on-line string searches of the type, ‘‘Is W a substring of A?’’ to be answered in time O(P + log N), where P is the length of W and N is the length of A, which is competitive with (and in some cases slightly better than) suffix trees. The only drawback is that in those instances where the underlying alphabet is finite and small, suffix trees can be constructed in O(N) time in the worst case, versus O(N log N) time for suffix arrays.
However, we give an augmented algorithm that, regardless of the alphabet size, constructs suffix arrays in O(N) expected time, albeit with lesser space efficiency. We believe that suffix arrays will prove to be better in practice than suffix trees for many applications.