PALMapper: Fast and Accurate Spliced Alignments of Sequence Reads

News

Jul. 20, 2013:
PALMapper was presented at HiTSeq 2013 in Berlin, Germany.
Download the talk
Jul. 09, 2013:
New features: bug fixes, variant-aware alignments
Improvements: junction-remapping, indexing
Jul. 22, 2011:
PALMapper was presented at the poster session of ISMB/ECCB 2011 in Vienna, Austria.
Download the poster
Jul. 08, 2011:
New features: BAM output, intron junction remapping,
spliced alignments with non-consensus sequences as splice sites, protocol information
Bug fixes including SAM output
Feb. 02, 2011:
PALMapper version 0.4 released (release candidate 4)
This release fixes a few bugs and introduces new options:
spliced alignments with non-consensus splice sequences (experimental),
multithreading, fast mapping with bwt-based index (experimental)
Dec. 20, 2010: PALMapper tutorial paper is published and available
May 09, 2010:
PALMapper version 0.4 released (release candidate 3)
This release saves space by using smaller index
rtrim and polytrim strategies have been improved
PALMapper output has been reorganized: unspliced and spliced alignments
are merged in the same file (SAM as default)
Apr. 16, 2010:
PALMapper version 0.4 released (release candidate 2)
Improvement of BED output for rtim strategy.
genomemapper and mkindex were renamed as palmapper and pmindex
Apr. 14, 2010:
PALMapper version 0.4 released (release candidate 1)

The fusion of GenomeMapper and QPALMA, called PALMapper, is designed to accurately align RNA-seq reads against genomes.

The benefits of Palmapper include:

  1. Alignments with mismatches and indels.
  2. Accurate spliced alignments using computational splice site predictions, if available.
  3. Fast alignments (about 10 million reads/hour).

The source code of Palmapper is available from

https://public.bmi.inf.ethz.ch/software/palmapper.

A tutorial paper [1] explaining how to use PALMapper on the command-line or through a Web service, which is a customized version of the Galaxy framework has been published in Current Protocols in Bioinformatics. A related webpage containing up-to-date links to data and software is maintained at supplements/palmapper/tutorial.

Summary:

Genome and transcriptome sequencing experience a challenging renewal with the advent of Next Generation Sequencing (NGS) technologies. Notably, short mRNA sequences produced by RNA-Seq enhance transcriptome analysis and promise great opportunities for the discovery of new genes and the identification of alternative transcripts. One way to analyze this data is aligning the reads against a reference genome. However, the sheer amount of NGS data requires highly efficient methods for accurate spliced alignments, which is further challenged by the size and quality of the sequence reads.

We propose a combination of the spliced alignment method QPALMA [2] with the short read alignment tool GenomeMapper [3]. The resulting method, called PALMapper, efficiently computes both spliced and unspliced alignments at high accuracy while taking advantage of base quality information and splice site predictions. QPALMA that relies on a machine learning strategy is highly sensitive but suffers from its time consumption in the alignment step, which can be impractical for large genomes or extremely large introns. To speed this up and thus to improve efficiency, we combined it with GenomeMapper that quickly carries out an initial read mapping which will then guide a banded Semi-Global and spliced alignment algorithm that allows for long gaps that correspond to introns. PALMapper considerably reduced time consumption without decreasing accuracy compared to QPALMA. In fact, it runs around 50 times faster and hence allows to align around 7 million reads per hour on a single AMD CPU core (similar speed as TopHat [4]). Our study for C. elegans furthermore shows that PALMapper predicts introns with very high sensitivity (72%) and specificity (82%) when using the annotation as ground truth. PALMapper is considerably more accurate than TopHat (47% and 81%, respectively).

Contact:

In case of comments, problems, questions etc. feel free to contact

References

[1] Jean, G., Kahles, A., Sreedharan, V.T., De Bona, F., Raetsch, G., RNA-Seq Read Alignments with PALMapper, Curr. Protoc. Bioinform., 32:11.6.1-11.6.38, 2010.
[2] De Bona, F. et al., Optimal Spliced Alignments of Short Sequence Reads, ECCB08/Bioinformatics, 24 (16):i174, 2008.
[3] Schneeberger, K. et al., Simultaneous alignment of short reads against multiple genomes, Genome Biol. 10 (9):R98, 2009.
[4] Trapnel, C. et al., TopHat: discovering splice junctions with RNA-Seq, Bioinformatics. 25 (9):1105-11, 2009.